Latent Growth Curves in R
This is based in part on Nicole’s code found in this post.
First, reading in the data:
library(foreign)
library(dplyr)
library(tidyr)
library(ggplot2)
library(lavaan)
library(semPlot)
library(knitr)
setwd('/home/jflournoy/code/sem_in_r/')
pdr2<-read.spss("PDR Wave 2.sav", to.data.frame=T)
# pdr4<-read.spss("PDR Wave 4.sav", to.data.frame=T)
Generate a time variable
This indexes each call for each family.
pdr2_time <- pdr2 %>%
group_by(FAMILY) %>% #do the count by family
arrange(YEAR,MONTH,DAY) %>% #sort by date
mutate(callindex=1:n()) #create call index that's 1:end for each family
head(pdr2_time[,c('FAMILY','callindex')])
## Source: local data frame [6 x 2]
## Groups: FAMILY
##
## FAMILY callindex
## 1 TP001 1
## 2 TP001 2
## 3 TP001 3
## 4 TP001 4
## 5 TP001 5
## 6 TP001 6
Create composite score of kid bex
pdr2_time_bxtrans<-pdr2_time %>% ungroup %>%
mutate_each(
funs(as.numeric(!(.=='DID NOT OCCUR'))),
P31201:P31240)
To break down the above statement:
pdr2_time
gets sent to ungroup, which removes the grouping by FAMILY
we did above, and that gets sent to mutate_each
. This takes a range of columns that we define in the second argument as P31201:P31240
which reads ‘from P31201 to P31240’.
The meat of mutate_each
is what goes in the first argument, within the function funs()
. You can list a bunch of functions here if you wanted to mutate all the columns in a number of different ways. The period character, .
, represents the column that will be passed to that function. In this case, we just check if each element of the column is ‘DID NOT OCCUR’, and if so, we negate it (giving us FALSE
) and then as.numeric
it giving us ‘0’. If the response is any other option, we get a ‘1’, which is what we want. Importantly, this will return NA
if the data is NA
.
If you want to learn more ?mutate_each
. Moving on now…
head(pdr2_time_bxtrans)
## Source: local data frame [6 x 63]
##
## FAMILY RESP MONTH DAY YEAR INT WEEKDAY P31201 P31202 P31203 P31204
## 1 TP001 3 7 22 2004 4H WEDNESDAY 0 1 1 0
## 2 TP001 3 7 23 2004 4H THURSAY 0 1 1 0
## 3 TP001 3 7 27 2004 4H MONDAY 0 1 1 0
## 4 TP001 3 7 28 2004 4H TUESDAY 0 1 1 0
## 5 TP001 3 8 3 2004 4H MONDAY 0 1 1 0
## 6 TP001 3 8 4 2004 4H TUESDAY 0 1 1 0
## Variables not shown: P31205 (dbl), P31206 (dbl), P31207 (dbl), P31208
## (dbl), P31209 (dbl), P31210 (dbl), P31211 (dbl), P31212 (dbl), P31213
## (dbl), P31214 (dbl), P31215 (dbl), P31216 (dbl), P31217 (dbl), P31218
## (dbl), P31219 (dbl), P31220 (dbl), P31221 (dbl), P31222 (dbl), P31223
## (dbl), P31224 (dbl), P31225 (dbl), P31226 (dbl), P31227 (dbl), P31228
## (dbl), P31229 (dbl), P31230 (dbl), P31231 (dbl), P31232 (dbl), P31233
## (dbl), P31234 (dbl), P31235 (dbl), P31236 (dbl), P31237 (dbl), P31238
## (dbl), P31239 (dbl), P31240 (dbl), P31241 (fctr), P31242 (fctr), P31242A
## (fctr), P31242B (fctr), P31242C (fctr), P31242D (fctr), P31243 (fctr),
## P31243A (fctr), P31243B (fctr), P31243C (fctr), P31243D (fctr), P31244
## (dbl), P31245 (dbl), WAVE (dbl), PILOT1 (fctr), callindex (int)
Now we can create the composite variable using a sum. There is missing data, so that should be delt with, but we’ll ignore that for now.
pdr2_time_bxtrans$bextot<-pdr2_time_bxtrans %>%
select(P31201:P31240) %>% rowSums(na.rm=T)
summary(pdr2_time_bxtrans$bextot)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.000 4.000 5.307 8.000 26.000
hist(pdr2_time_bxtrans$bextot)
We have a call-index varibale, but if you want a time index that is really the number of days since they were first contacted, here’s how to do that.
pdr2_time_bxtrans_t2<-
pdr2_time_bxtrans %>%
mutate(formd_date=as.Date(paste(MONTH,DAY,YEAR,sep='/'),"%m/%d/%Y")) %>%
group_by(FAMILY) %>%
arrange(YEAR,MONTH,DAY) %>%
mutate(days_since_1st=formd_date-min(formd_date))
Some descriptive plots
Let’s see what we’re working with.
First, the raw data – a line for every family:
pdr2_time_bxtrans_t2 %>% filter(callindex < 33) %>%
ggplot(aes(x=callindex,y=bextot))+
geom_line(aes(group=FAMILY),alpha=.1)+
theme(panel.background=element_rect(fill='white'))
Next, a smoothed loess curve for every family:
pdr2_time_bxtrans_t2 %>% filter(callindex < 33) %>%
ggplot(aes(x=callindex,y=bextot))+
geom_smooth(aes(group=FAMILY),method=loess,se=F)+
theme(panel.background=element_rect(fill='white'))
Finally, the first plot again, but with a group-level smoothed curve – a line, and a second and third degree polynomial just to see what that looks like:
pdr2_time_bxtrans_t2 %>% filter(callindex < 33) %>%
ggplot(aes(x=callindex,y=bextot))+
geom_line(aes(group=FAMILY),alpha=.05)+
geom_smooth(method=lm,formula=y~poly(x,1),color='dark orange',se=F)+
geom_smooth(method=lm,formula=y~poly(x,2),color='red',se=F)+
geom_smooth(method=lm,formula=y~poly(x,3),color='black',se=F)+
theme(panel.background=element_rect(fill='white'))
From the plot it looks like either a linear or quadratic would capture this pretty well.
Widen the data by call index
Now we can make a wide data file to examine the trend of bextot.
pdr2_bex_w_by_index<-
pdr2_time_bxtrans_t2 %>% select(FAMILY,bextot,callindex) %>%
filter(callindex<33) %>% # just the first 32 to minimize missingness
mutate(callindex=paste('call',formatC(callindex, width = 2, format = "d", flag = "0"),sep='_')) %>%
spread(callindex,bextot)
Make R make the lavaan model code
let’s make the lavaan code. We don’t want to write out each ‘call_N’ var, so we can do it programatically
(allthecallnames<-grep('call',names(pdr2_bex_w_by_index),value=T))
## [1] "call_01" "call_02" "call_03" "call_04" "call_05" "call_06" "call_07"
## [8] "call_08" "call_09" "call_10" "call_11" "call_12" "call_13" "call_14"
## [15] "call_15" "call_16" "call_17" "call_18" "call_19" "call_20" "call_21"
## [22] "call_22" "call_23" "call_24" "call_25" "call_26" "call_27" "call_28"
## [29] "call_29" "call_30" "call_31" "call_32"
(callnameswith_i_weights<-paste('1*',allthecallnames,sep=''))
## [1] "1*call_01" "1*call_02" "1*call_03" "1*call_04" "1*call_05"
## [6] "1*call_06" "1*call_07" "1*call_08" "1*call_09" "1*call_10"
## [11] "1*call_11" "1*call_12" "1*call_13" "1*call_14" "1*call_15"
## [16] "1*call_16" "1*call_17" "1*call_18" "1*call_19" "1*call_20"
## [21] "1*call_21" "1*call_22" "1*call_23" "1*call_24" "1*call_25"
## [26] "1*call_26" "1*call_27" "1*call_28" "1*call_29" "1*call_30"
## [31] "1*call_31" "1*call_32"
Centering is really important for interpretting the linear effect in the presence of a quadratic. The intercept and linear slope term here are interpreted only at the first timepoint, though one can easily calculate the expected value of a point at any timepoint.
(callnameswith_s_weights<-paste(0:31,'*',allthecallnames,sep=''))
## [1] "0*call_01" "1*call_02" "2*call_03" "3*call_04" "4*call_05"
## [6] "5*call_06" "6*call_07" "7*call_08" "8*call_09" "9*call_10"
## [11] "10*call_11" "11*call_12" "12*call_13" "13*call_14" "14*call_15"
## [16] "15*call_16" "16*call_17" "17*call_18" "18*call_19" "19*call_20"
## [21] "20*call_21" "21*call_22" "22*call_23" "23*call_24" "24*call_25"
## [26] "25*call_26" "26*call_27" "27*call_28" "28*call_29" "29*call_30"
## [31] "30*call_31" "31*call_32"
(callnameswith_q_weights<-paste((0:31)^2,'*',allthecallnames,sep=''))
## [1] "0*call_01" "1*call_02" "4*call_03" "9*call_04" "16*call_05"
## [6] "25*call_06" "36*call_07" "49*call_08" "64*call_09" "81*call_10"
## [11] "100*call_11" "121*call_12" "144*call_13" "169*call_14" "196*call_15"
## [16] "225*call_16" "256*call_17" "289*call_18" "324*call_19" "361*call_20"
## [21] "400*call_21" "441*call_22" "484*call_23" "529*call_24" "576*call_25"
## [26] "625*call_26" "676*call_27" "729*call_28" "784*call_29" "841*call_30"
## [31] "900*call_31" "961*call_32"
(i_weights_collapsed<-paste(callnameswith_i_weights,collapse=' + '))
## [1] "1*call_01 + 1*call_02 + 1*call_03 + 1*call_04 + 1*call_05 + 1*call_06 + 1*call_07 + 1*call_08 + 1*call_09 + 1*call_10 + 1*call_11 + 1*call_12 + 1*call_13 + 1*call_14 + 1*call_15 + 1*call_16 + 1*call_17 + 1*call_18 + 1*call_19 + 1*call_20 + 1*call_21 + 1*call_22 + 1*call_23 + 1*call_24 + 1*call_25 + 1*call_26 + 1*call_27 + 1*call_28 + 1*call_29 + 1*call_30 + 1*call_31 + 1*call_32"
(s_weights_collapsed<-paste(callnameswith_s_weights,collapse=' + '))
## [1] "0*call_01 + 1*call_02 + 2*call_03 + 3*call_04 + 4*call_05 + 5*call_06 + 6*call_07 + 7*call_08 + 8*call_09 + 9*call_10 + 10*call_11 + 11*call_12 + 12*call_13 + 13*call_14 + 14*call_15 + 15*call_16 + 16*call_17 + 17*call_18 + 18*call_19 + 19*call_20 + 20*call_21 + 21*call_22 + 22*call_23 + 23*call_24 + 24*call_25 + 25*call_26 + 26*call_27 + 27*call_28 + 28*call_29 + 29*call_30 + 30*call_31 + 31*call_32"
(q_weights_collapsed<-paste(callnameswith_q_weights,collapse=' + '))
## [1] "0*call_01 + 1*call_02 + 4*call_03 + 9*call_04 + 16*call_05 + 25*call_06 + 36*call_07 + 49*call_08 + 64*call_09 + 81*call_10 + 100*call_11 + 121*call_12 + 144*call_13 + 169*call_14 + 196*call_15 + 225*call_16 + 256*call_17 + 289*call_18 + 324*call_19 + 361*call_20 + 400*call_21 + 441*call_22 + 484*call_23 + 529*call_24 + 576*call_25 + 625*call_26 + 676*call_27 + 729*call_28 + 784*call_29 + 841*call_30 + 900*call_31 + 961*call_32"
model <- paste(
' i =~ ',i_weights_collapsed,'\n\n',
' s =~ ',s_weights_collapsed,'\n\n',
' q =~ ',q_weights_collapsed,sep='')
cat(model)
## i =~ 1*call_01 + 1*call_02 + 1*call_03 + 1*call_04 + 1*call_05 + 1*call_06 + 1*call_07 + 1*call_08 + 1*call_09 + 1*call_10 + 1*call_11 + 1*call_12 + 1*call_13 + 1*call_14 + 1*call_15 + 1*call_16 + 1*call_17 + 1*call_18 + 1*call_19 + 1*call_20 + 1*call_21 + 1*call_22 + 1*call_23 + 1*call_24 + 1*call_25 + 1*call_26 + 1*call_27 + 1*call_28 + 1*call_29 + 1*call_30 + 1*call_31 + 1*call_32
##
## s =~ 0*call_01 + 1*call_02 + 2*call_03 + 3*call_04 + 4*call_05 + 5*call_06 + 6*call_07 + 7*call_08 + 8*call_09 + 9*call_10 + 10*call_11 + 11*call_12 + 12*call_13 + 13*call_14 + 14*call_15 + 15*call_16 + 16*call_17 + 17*call_18 + 18*call_19 + 19*call_20 + 20*call_21 + 21*call_22 + 22*call_23 + 23*call_24 + 24*call_25 + 25*call_26 + 26*call_27 + 27*call_28 + 28*call_29 + 29*call_30 + 30*call_31 + 31*call_32
##
## q =~ 0*call_01 + 1*call_02 + 4*call_03 + 9*call_04 + 16*call_05 + 25*call_06 + 36*call_07 + 49*call_08 + 64*call_09 + 81*call_10 + 100*call_11 + 121*call_12 + 144*call_13 + 169*call_14 + 196*call_15 + 225*call_16 + 256*call_17 + 289*call_18 + 324*call_19 + 361*call_20 + 400*call_21 + 441*call_22 + 484*call_23 + 529*call_24 + 576*call_25 + 625*call_26 + 676*call_27 + 729*call_28 + 784*call_29 + 841*call_30 + 900*call_31 + 961*call_32
Fit the model!
fit <- growth(model, data=pdr2_bex_w_by_index)
Here’s the model we fit:
semPaths(fit)
And here’s the summary:
summary(fit)
## lavaan (0.5-18) converged normally after 169 iterations
##
## Used Total
## Number of observations 72 100
##
## Estimator ML
## Minimum Function Test Statistic 838.225
## Degrees of freedom 519
## P-value (Chi-square) 0.000
##
## Parameter estimates:
##
## Information Expected
## Standard Errors Standard
##
## Estimate Std.err Z-value P(>|z|)
## Latent variables:
## i =~
## call_01 1.000
## call_02 1.000
## call_03 1.000
## call_04 1.000
## call_05 1.000
## call_06 1.000
## call_07 1.000
## call_08 1.000
## call_09 1.000
## call_10 1.000
## call_11 1.000
## call_12 1.000
## call_13 1.000
## call_14 1.000
## call_15 1.000
## call_16 1.000
## call_17 1.000
## call_18 1.000
## call_19 1.000
## call_20 1.000
## call_21 1.000
## call_22 1.000
## call_23 1.000
## call_24 1.000
## call_25 1.000
## call_26 1.000
## call_27 1.000
## call_28 1.000
## call_29 1.000
## call_30 1.000
## call_31 1.000
## call_32 1.000
## s =~
## call_01 0.000
## call_02 1.000
## call_03 2.000
## call_04 3.000
## call_05 4.000
## call_06 5.000
## call_07 6.000
## call_08 7.000
## call_09 8.000
## call_10 9.000
## call_11 10.000
## call_12 11.000
## call_13 12.000
## call_14 13.000
## call_15 14.000
## call_16 15.000
## call_17 16.000
## call_18 17.000
## call_19 18.000
## call_20 19.000
## call_21 20.000
## call_22 21.000
## call_23 22.000
## call_24 23.000
## call_25 24.000
## call_26 25.000
## call_27 26.000
## call_28 27.000
## call_29 28.000
## call_30 29.000
## call_31 30.000
## call_32 31.000
## q =~
## call_01 0.000
## call_02 1.000
## call_03 4.000
## call_04 9.000
## call_05 16.000
## call_06 25.000
## call_07 36.000
## call_08 49.000
## call_09 64.000
## call_10 81.000
## call_11 100.000
## call_12 121.000
## call_13 144.000
## call_14 169.000
## call_15 196.000
## call_16 225.000
## call_17 256.000
## call_18 289.000
## call_19 324.000
## call_20 361.000
## call_21 400.000
## call_22 441.000
## call_23 484.000
## call_24 529.000
## call_25 576.000
## call_26 625.000
## call_27 676.000
## call_28 729.000
## call_29 784.000
## call_30 841.000
## call_31 900.000
## call_32 961.000
##
## Covariances:
## i ~~
## s -0.266 0.154 -1.721 0.085
## q 0.004 0.004 1.008 0.313
## s ~~
## q -0.001 0.000 -2.018 0.044
##
## Intercepts:
## call_01 0.000
## call_02 0.000
## call_03 0.000
## call_04 0.000
## call_05 0.000
## call_06 0.000
## call_07 0.000
## call_08 0.000
## call_09 0.000
## call_10 0.000
## call_11 0.000
## call_12 0.000
## call_13 0.000
## call_14 0.000
## call_15 0.000
## call_16 0.000
## call_17 0.000
## call_18 0.000
## call_19 0.000
## call_20 0.000
## call_21 0.000
## call_22 0.000
## call_23 0.000
## call_24 0.000
## call_25 0.000
## call_26 0.000
## call_27 0.000
## call_28 0.000
## call_29 0.000
## call_30 0.000
## call_31 0.000
## call_32 0.000
## i 7.402 0.417 17.760 0.000
## s -0.261 0.038 -6.913 0.000
## q 0.005 0.001 5.015 0.000
##
## Variances:
## call_01 16.731 3.107
## call_02 9.807 1.915
## call_03 11.202 2.087
## call_04 9.089 1.700
## call_05 17.265 3.023
## call_06 15.249 2.667
## call_07 18.412 3.178
## call_08 15.511 2.684
## call_09 16.776 2.886
## call_10 10.061 1.764
## call_11 12.751 2.208
## call_12 14.664 2.524
## call_13 12.104 2.097
## call_14 13.257 2.289
## call_15 11.621 2.017
## call_16 7.045 1.257
## call_17 10.488 1.828
## call_18 11.085 1.927
## call_19 9.291 1.628
## call_20 8.794 1.545
## call_21 11.203 1.945
## call_22 11.655 2.020
## call_23 7.416 1.316
## call_24 7.003 1.249
## call_25 10.300 1.800
## call_26 13.740 2.377
## call_27 13.083 2.275
## call_28 8.501 1.526
## call_29 8.287 1.507
## call_30 8.291 1.533
## call_31 7.677 1.469
## call_32 10.034 1.899
## i 9.498 2.095
## s 0.042 0.017
## q 0.000 0.000
Starting with this growth model, we could now add in predictors and outcomes for the latent i
, s
, and q
variables.
// add bootstrap table styles to pandoc tables $(document).ready(function () { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); });
This is awesome!! Thanks, John!