April 1, 2014

Matrix algebra in R

Hi all,

To get us started off with some quick topic presentations at R Club this term, I’ll go over a little matrix algebra on Thursday. Those of you who have taken PSY613 may remember doing similar exercises in SPSS, in MATLAB, and/or by hand. Those of you who are in PSY613 currently will remember this from, well, the future, since we’ll be working on matrix algebra in lab on Friday.

Here’s a link to the code for an example: matrix_algebra.r

See you Thursday at 12pm in Franklin 271A!

# awesome resources: # http://www.ats.ucla.edu/stat/r/library/matrix_alg.htm # http://www.statmethods.net/advstats/matrix.html


# matrix algebra in R. GLM for a t-test.

?matrix() # define DV vector DV <- matrix(data=seq(1:10), nrow=10, ncol=1) # matrix(data, nrow, ncol, byrow). in this case, the data are defined by seq(1:10) # check your work. DV should be one column of 10 observations. DV class(DV) # specify design matrix. this is a t-test, so there are two groups, each n=5 # since the data for this matrix is a little harder to specify quickly, we'll do it first and then plug it into the matrix function, so it's not too messy. ?rep() group1 <- c(rep(1,5), rep(0,5)) group2 <- c(rep(0,5), rep(1,5)) X <- matrix(data=c(group1,group2), nrow=10, ncol=2) # the function matrix defaults to filling in by column (byrow=FALSE), which is what works best here. to fill in by row, byrow=TRUE # check your work. X should be two columns of 10 observations, with 1's for the first 5 observations in column 1 and 0's in column 2, and vice versa for the second 5 observations. X # GLM: DV = Xb + e. We have X and DV now, so we can solve for the parameters (betas) and the error for each observation. # assume e is zero (which is is, on average), then solve for b with X^-1 * DV n <- nrow(X) # number of observations p <- ncol(X) # number of paramaters df_m <- p-1 df_e <- n-p # X isn't square, so X^-1 is not possible. Get the pseudoinverse instead. Load MASS package to get the ginv() function library(MASS) X_inv <- ginv(X) # R defaults to element-by-element multiplication instead of matrix multiplication. Check it out: betas <- X_inv*DV # oh no! that's not right. betas <- X_inv%*%DV # yay! use %*% instead of * to specify matrix algebra. # since this is a t-test, there are two betas (2x1 matrix), one for each group mean dim(betas) # to get SS model, calculate the predicted score for each observation first model <- X%*%betas gm <- colMeans(DV) # grand mean SSM <- sum((model-gm)^2) MSM <- SSM/df_m # to get SS error, calculate error/residual for each observation (observed-predicted) errors <- DV-model SSE <- sum(errors^2) MSE <- SSE/df_e F <- MSM/MSE p_val <- 1- pf(F, df_m, df_e)