Matrix algebra in R
Hi all,
To get us started off with some quick topic presentations at R Club this term, I’ll go over a little matrix algebra on Thursday. Those of you who have taken PSY613 may remember doing similar exercises in SPSS, in MATLAB, and/or by hand. Those of you who are in PSY613 currently will remember this from, well, the future, since we’ll be working on matrix algebra in lab on Friday.
Here’s a link to the code for an example: matrix_algebra.r
See you Thursday at 12pm in Franklin 271A!
# awesome resources:
# http://www.ats.ucla.edu/stat/r/library/matrix_alg.htm
# http://www.statmethods.net/advstats/matrix.html
# matrix algebra in R. GLM for a t-test.
?matrix()
# define DV vector
DV <- matrix(data=seq(1:10), nrow=10, ncol=1)
# matrix(data, nrow, ncol, byrow). in this case, the data are defined by seq(1:10)
# check your work. DV should be one column of 10 observations.
DV
class(DV)
# specify design matrix. this is a t-test, so there are two groups, each n=5
# since the data for this matrix is a little harder to specify quickly, we'll do it first and then plug it into the matrix function, so it's not too messy.
?rep()
group1 <- c(rep(1,5), rep(0,5))
group2 <- c(rep(0,5), rep(1,5))
X <- matrix(data=c(group1,group2), nrow=10, ncol=2)
# the function matrix defaults to filling in by column (byrow=FALSE), which is what works best here. to fill in by row, byrow=TRUE
# check your work. X should be two columns of 10 observations, with 1's for the first 5 observations in column 1 and 0's in column 2, and vice versa for the second 5 observations.
X
# GLM: DV = Xb + e. We have X and DV now, so we can solve for the parameters (betas) and the error for each observation.
# assume e is zero (which is is, on average), then solve for b with X^-1 * DV
n <- nrow(X) # number of observations
p <- ncol(X) # number of paramaters
df_m <- p-1
df_e <- n-p
# X isn't square, so X^-1 is not possible. Get the pseudoinverse instead. Load MASS package to get the ginv() function
library(MASS)
X_inv <- ginv(X)
# R defaults to element-by-element multiplication instead of matrix multiplication. Check it out:
betas <- X_inv*DV
# oh no! that's not right.
betas <- X_inv%*%DV
# yay! use %*% instead of * to specify matrix algebra.
# since this is a t-test, there are two betas (2x1 matrix), one for each group mean
dim(betas)
# to get SS model, calculate the predicted score for each observation first
model <- X%*%betas
gm <- colMeans(DV) # grand mean
SSM <- sum((model-gm)^2)
MSM <- SSM/df_m
# to get SS error, calculate error/residual for each observation (observed-predicted)
errors <- DV-model
SSE <- sum(errors^2)
MSE <- SSE/df_e
F <- MSM/MSE
p_val <- 1- pf(F, df_m, df_e)