April 25, 2015

Latent Growth Curves in R

This is based in part on Nicole’s code found in this post.

sem_in_r.r

First, reading in the data:

library(foreign)
library(dplyr)
library(tidyr)
library(ggplot2)
library(lavaan)
library(semPlot)
library(knitr)

setwd('/home/jflournoy/code/sem_in_r/')

pdr2<-read.spss("PDR Wave 2.sav", to.data.frame=T)
# pdr4<-read.spss("PDR Wave 4.sav", to.data.frame=T)

Generate a time variable

This indexes each call for each family.

pdr2_time <- pdr2 %>%
group_by(FAMILY) %>% #do the count by family
  arrange(YEAR,MONTH,DAY) %>% #sort by date
  mutate(callindex=1:n()) #create call index that's 1:end for each family

head(pdr2_time[,c('FAMILY','callindex')])

## Source: local data frame [6 x 2]
## Groups: FAMILY
## 
##   FAMILY callindex
## 1  TP001         1
## 2  TP001         2
## 3  TP001         3
## 4  TP001         4
## 5  TP001         5
## 6  TP001         6

Create composite score of kid bex

pdr2_time_bxtrans<-pdr2_time %>% ungroup %>% 
  mutate_each(
    funs(as.numeric(!(.=='DID NOT OCCUR'))),
    P31201:P31240)

To break down the above statement:

pdr2_time gets sent to ungroup, which removes the grouping by FAMILY we did above, and that gets sent to mutate_each. This takes a range of columns that we define in the second argument as P31201:P31240 which reads ‘from P31201 to P31240’.

The meat of mutate_each is what goes in the first argument, within the function funs(). You can list a bunch of functions here if you wanted to mutate all the columns in a number of different ways. The period character, ., represents the column that will be passed to that function. In this case, we just check if each element of the column is ‘DID NOT OCCUR’, and if so, we negate it (giving us FALSE) and then as.numeric it giving us ‘0’. If the response is any other option, we get a ‘1’, which is what we want. Importantly, this will return NA if the data is NA.

If you want to learn more ?mutate_each. Moving on now…

head(pdr2_time_bxtrans)

## Source: local data frame [6 x 63]
## 
##   FAMILY RESP MONTH DAY YEAR INT   WEEKDAY P31201 P31202 P31203 P31204
## 1  TP001    3     7  22 2004  4H WEDNESDAY      0      1      1      0
## 2  TP001    3     7  23 2004  4H   THURSAY      0      1      1      0
## 3  TP001    3     7  27 2004  4H    MONDAY      0      1      1      0
## 4  TP001    3     7  28 2004  4H   TUESDAY      0      1      1      0
## 5  TP001    3     8   3 2004  4H    MONDAY      0      1      1      0
## 6  TP001    3     8   4 2004  4H   TUESDAY      0      1      1      0
## Variables not shown: P31205 (dbl), P31206 (dbl), P31207 (dbl), P31208
##   (dbl), P31209 (dbl), P31210 (dbl), P31211 (dbl), P31212 (dbl), P31213
##   (dbl), P31214 (dbl), P31215 (dbl), P31216 (dbl), P31217 (dbl), P31218
##   (dbl), P31219 (dbl), P31220 (dbl), P31221 (dbl), P31222 (dbl), P31223
##   (dbl), P31224 (dbl), P31225 (dbl), P31226 (dbl), P31227 (dbl), P31228
##   (dbl), P31229 (dbl), P31230 (dbl), P31231 (dbl), P31232 (dbl), P31233
##   (dbl), P31234 (dbl), P31235 (dbl), P31236 (dbl), P31237 (dbl), P31238
##   (dbl), P31239 (dbl), P31240 (dbl), P31241 (fctr), P31242 (fctr), P31242A
##   (fctr), P31242B (fctr), P31242C (fctr), P31242D (fctr), P31243 (fctr),
##   P31243A (fctr), P31243B (fctr), P31243C (fctr), P31243D (fctr), P31244
##   (dbl), P31245 (dbl), WAVE (dbl), PILOT1 (fctr), callindex (int)

Now we can create the composite variable using a sum. There is missing data, so that should be delt with, but we’ll ignore that for now.

pdr2_time_bxtrans$bextot<-pdr2_time_bxtrans %>%
  select(P31201:P31240) %>% rowSums(na.rm=T)

summary(pdr2_time_bxtrans$bextot)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.000   4.000   5.307   8.000  26.000

hist(pdr2_time_bxtrans$bextot)

We have a call-index varibale, but if you want a time index that is really the number of days since they were first contacted, here’s how to do that.

pdr2_time_bxtrans_t2<-
  pdr2_time_bxtrans %>% 
  mutate(formd_date=as.Date(paste(MONTH,DAY,YEAR,sep='/'),"%m/%d/%Y")) %>% 
  group_by(FAMILY) %>%
  arrange(YEAR,MONTH,DAY) %>%
  mutate(days_since_1st=formd_date-min(formd_date))

Some descriptive plots

Let’s see what we’re working with.

First, the raw data – a line for every family:

pdr2_time_bxtrans_t2 %>% filter(callindex < 33) %>%
  ggplot(aes(x=callindex,y=bextot))+
    geom_line(aes(group=FAMILY),alpha=.1)+
    theme(panel.background=element_rect(fill='white'))

Next, a smoothed loess curve for every family:

pdr2_time_bxtrans_t2 %>% filter(callindex < 33) %>%
  ggplot(aes(x=callindex,y=bextot))+
    geom_smooth(aes(group=FAMILY),method=loess,se=F)+
    theme(panel.background=element_rect(fill='white'))

Finally, the first plot again, but with a group-level smoothed curve – a line, and a second and third degree polynomial just to see what that looks like:

pdr2_time_bxtrans_t2 %>% filter(callindex < 33) %>%
  ggplot(aes(x=callindex,y=bextot))+
    geom_line(aes(group=FAMILY),alpha=.05)+
    geom_smooth(method=lm,formula=y~poly(x,1),color='dark orange',se=F)+
    geom_smooth(method=lm,formula=y~poly(x,2),color='red',se=F)+
    geom_smooth(method=lm,formula=y~poly(x,3),color='black',se=F)+
    theme(panel.background=element_rect(fill='white'))

From the plot it looks like either a linear or quadratic would capture this pretty well.

Widen the data by call index

Now we can make a wide data file to examine the trend of bextot.

pdr2_bex_w_by_index<-
  pdr2_time_bxtrans_t2 %>% select(FAMILY,bextot,callindex) %>% 
  filter(callindex<33) %>% # just the first 32 to minimize missingness
  mutate(callindex=paste('call',formatC(callindex, width = 2, format = "d", flag = "0"),sep='_')) %>%
  spread(callindex,bextot)

Make R make the lavaan model code

let’s make the lavaan code. We don’t want to write out each ‘call_N’ var, so we can do it programatically

(allthecallnames<-grep('call',names(pdr2_bex_w_by_index),value=T))

##  [1] "call_01" "call_02" "call_03" "call_04" "call_05" "call_06" "call_07"
##  [8] "call_08" "call_09" "call_10" "call_11" "call_12" "call_13" "call_14"
## [15] "call_15" "call_16" "call_17" "call_18" "call_19" "call_20" "call_21"
## [22] "call_22" "call_23" "call_24" "call_25" "call_26" "call_27" "call_28"
## [29] "call_29" "call_30" "call_31" "call_32"

(callnameswith_i_weights<-paste('1*',allthecallnames,sep=''))

##  [1] "1*call_01" "1*call_02" "1*call_03" "1*call_04" "1*call_05"
##  [6] "1*call_06" "1*call_07" "1*call_08" "1*call_09" "1*call_10"
## [11] "1*call_11" "1*call_12" "1*call_13" "1*call_14" "1*call_15"
## [16] "1*call_16" "1*call_17" "1*call_18" "1*call_19" "1*call_20"
## [21] "1*call_21" "1*call_22" "1*call_23" "1*call_24" "1*call_25"
## [26] "1*call_26" "1*call_27" "1*call_28" "1*call_29" "1*call_30"
## [31] "1*call_31" "1*call_32"

Centering is really important for interpretting the linear effect in the presence of a quadratic. The intercept and linear slope term here are interpreted only at the first timepoint, though one can easily calculate the expected value of a point at any timepoint.

(callnameswith_s_weights<-paste(0:31,'*',allthecallnames,sep=''))

##  [1] "0*call_01"  "1*call_02"  "2*call_03"  "3*call_04"  "4*call_05" 
##  [6] "5*call_06"  "6*call_07"  "7*call_08"  "8*call_09"  "9*call_10" 
## [11] "10*call_11" "11*call_12" "12*call_13" "13*call_14" "14*call_15"
## [16] "15*call_16" "16*call_17" "17*call_18" "18*call_19" "19*call_20"
## [21] "20*call_21" "21*call_22" "22*call_23" "23*call_24" "24*call_25"
## [26] "25*call_26" "26*call_27" "27*call_28" "28*call_29" "29*call_30"
## [31] "30*call_31" "31*call_32"

(callnameswith_q_weights<-paste((0:31)^2,'*',allthecallnames,sep=''))

##  [1] "0*call_01"   "1*call_02"   "4*call_03"   "9*call_04"   "16*call_05" 
##  [6] "25*call_06"  "36*call_07"  "49*call_08"  "64*call_09"  "81*call_10" 
## [11] "100*call_11" "121*call_12" "144*call_13" "169*call_14" "196*call_15"
## [16] "225*call_16" "256*call_17" "289*call_18" "324*call_19" "361*call_20"
## [21] "400*call_21" "441*call_22" "484*call_23" "529*call_24" "576*call_25"
## [26] "625*call_26" "676*call_27" "729*call_28" "784*call_29" "841*call_30"
## [31] "900*call_31" "961*call_32"

(i_weights_collapsed<-paste(callnameswith_i_weights,collapse=' + '))

## [1] "1*call_01 + 1*call_02 + 1*call_03 + 1*call_04 + 1*call_05 + 1*call_06 + 1*call_07 + 1*call_08 + 1*call_09 + 1*call_10 + 1*call_11 + 1*call_12 + 1*call_13 + 1*call_14 + 1*call_15 + 1*call_16 + 1*call_17 + 1*call_18 + 1*call_19 + 1*call_20 + 1*call_21 + 1*call_22 + 1*call_23 + 1*call_24 + 1*call_25 + 1*call_26 + 1*call_27 + 1*call_28 + 1*call_29 + 1*call_30 + 1*call_31 + 1*call_32"

(s_weights_collapsed<-paste(callnameswith_s_weights,collapse=' + '))

## [1] "0*call_01 + 1*call_02 + 2*call_03 + 3*call_04 + 4*call_05 + 5*call_06 + 6*call_07 + 7*call_08 + 8*call_09 + 9*call_10 + 10*call_11 + 11*call_12 + 12*call_13 + 13*call_14 + 14*call_15 + 15*call_16 + 16*call_17 + 17*call_18 + 18*call_19 + 19*call_20 + 20*call_21 + 21*call_22 + 22*call_23 + 23*call_24 + 24*call_25 + 25*call_26 + 26*call_27 + 27*call_28 + 28*call_29 + 29*call_30 + 30*call_31 + 31*call_32"

(q_weights_collapsed<-paste(callnameswith_q_weights,collapse=' + '))

## [1] "0*call_01 + 1*call_02 + 4*call_03 + 9*call_04 + 16*call_05 + 25*call_06 + 36*call_07 + 49*call_08 + 64*call_09 + 81*call_10 + 100*call_11 + 121*call_12 + 144*call_13 + 169*call_14 + 196*call_15 + 225*call_16 + 256*call_17 + 289*call_18 + 324*call_19 + 361*call_20 + 400*call_21 + 441*call_22 + 484*call_23 + 529*call_24 + 576*call_25 + 625*call_26 + 676*call_27 + 729*call_28 + 784*call_29 + 841*call_30 + 900*call_31 + 961*call_32"

model <- paste(
  ' i =~ ',i_weights_collapsed,'\n\n',
  ' s =~ ',s_weights_collapsed,'\n\n',
  ' q =~ ',q_weights_collapsed,sep='')
cat(model)

##  i =~ 1*call_01 + 1*call_02 + 1*call_03 + 1*call_04 + 1*call_05 + 1*call_06 + 1*call_07 + 1*call_08 + 1*call_09 + 1*call_10 + 1*call_11 + 1*call_12 + 1*call_13 + 1*call_14 + 1*call_15 + 1*call_16 + 1*call_17 + 1*call_18 + 1*call_19 + 1*call_20 + 1*call_21 + 1*call_22 + 1*call_23 + 1*call_24 + 1*call_25 + 1*call_26 + 1*call_27 + 1*call_28 + 1*call_29 + 1*call_30 + 1*call_31 + 1*call_32
## 
##  s =~ 0*call_01 + 1*call_02 + 2*call_03 + 3*call_04 + 4*call_05 + 5*call_06 + 6*call_07 + 7*call_08 + 8*call_09 + 9*call_10 + 10*call_11 + 11*call_12 + 12*call_13 + 13*call_14 + 14*call_15 + 15*call_16 + 16*call_17 + 17*call_18 + 18*call_19 + 19*call_20 + 20*call_21 + 21*call_22 + 22*call_23 + 23*call_24 + 24*call_25 + 25*call_26 + 26*call_27 + 27*call_28 + 28*call_29 + 29*call_30 + 30*call_31 + 31*call_32
## 
##  q =~ 0*call_01 + 1*call_02 + 4*call_03 + 9*call_04 + 16*call_05 + 25*call_06 + 36*call_07 + 49*call_08 + 64*call_09 + 81*call_10 + 100*call_11 + 121*call_12 + 144*call_13 + 169*call_14 + 196*call_15 + 225*call_16 + 256*call_17 + 289*call_18 + 324*call_19 + 361*call_20 + 400*call_21 + 441*call_22 + 484*call_23 + 529*call_24 + 576*call_25 + 625*call_26 + 676*call_27 + 729*call_28 + 784*call_29 + 841*call_30 + 900*call_31 + 961*call_32

Fit the model!

fit <- growth(model, data=pdr2_bex_w_by_index)

Here’s the model we fit:

semPaths(fit)

And here’s the summary:

summary(fit)

## lavaan (0.5-18) converged normally after 169 iterations
## 
##                                                   Used       Total
##   Number of observations                            72         100
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic              838.225
##   Degrees of freedom                               519
##   P-value (Chi-square)                           0.000
## 
## Parameter estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
##                    Estimate  Std.err  Z-value  P(>|z|)
## Latent variables:
##   i =~
##     call_01           1.000
##     call_02           1.000
##     call_03           1.000
##     call_04           1.000
##     call_05           1.000
##     call_06           1.000
##     call_07           1.000
##     call_08           1.000
##     call_09           1.000
##     call_10           1.000
##     call_11           1.000
##     call_12           1.000
##     call_13           1.000
##     call_14           1.000
##     call_15           1.000
##     call_16           1.000
##     call_17           1.000
##     call_18           1.000
##     call_19           1.000
##     call_20           1.000
##     call_21           1.000
##     call_22           1.000
##     call_23           1.000
##     call_24           1.000
##     call_25           1.000
##     call_26           1.000
##     call_27           1.000
##     call_28           1.000
##     call_29           1.000
##     call_30           1.000
##     call_31           1.000
##     call_32           1.000
##   s =~
##     call_01           0.000
##     call_02           1.000
##     call_03           2.000
##     call_04           3.000
##     call_05           4.000
##     call_06           5.000
##     call_07           6.000
##     call_08           7.000
##     call_09           8.000
##     call_10           9.000
##     call_11          10.000
##     call_12          11.000
##     call_13          12.000
##     call_14          13.000
##     call_15          14.000
##     call_16          15.000
##     call_17          16.000
##     call_18          17.000
##     call_19          18.000
##     call_20          19.000
##     call_21          20.000
##     call_22          21.000
##     call_23          22.000
##     call_24          23.000
##     call_25          24.000
##     call_26          25.000
##     call_27          26.000
##     call_28          27.000
##     call_29          28.000
##     call_30          29.000
##     call_31          30.000
##     call_32          31.000
##   q =~
##     call_01           0.000
##     call_02           1.000
##     call_03           4.000
##     call_04           9.000
##     call_05          16.000
##     call_06          25.000
##     call_07          36.000
##     call_08          49.000
##     call_09          64.000
##     call_10          81.000
##     call_11         100.000
##     call_12         121.000
##     call_13         144.000
##     call_14         169.000
##     call_15         196.000
##     call_16         225.000
##     call_17         256.000
##     call_18         289.000
##     call_19         324.000
##     call_20         361.000
##     call_21         400.000
##     call_22         441.000
##     call_23         484.000
##     call_24         529.000
##     call_25         576.000
##     call_26         625.000
##     call_27         676.000
##     call_28         729.000
##     call_29         784.000
##     call_30         841.000
##     call_31         900.000
##     call_32         961.000
## 
## Covariances:
##   i ~~
##     s                -0.266    0.154   -1.721    0.085
##     q                 0.004    0.004    1.008    0.313
##   s ~~
##     q                -0.001    0.000   -2.018    0.044
## 
## Intercepts:
##     call_01           0.000
##     call_02           0.000
##     call_03           0.000
##     call_04           0.000
##     call_05           0.000
##     call_06           0.000
##     call_07           0.000
##     call_08           0.000
##     call_09           0.000
##     call_10           0.000
##     call_11           0.000
##     call_12           0.000
##     call_13           0.000
##     call_14           0.000
##     call_15           0.000
##     call_16           0.000
##     call_17           0.000
##     call_18           0.000
##     call_19           0.000
##     call_20           0.000
##     call_21           0.000
##     call_22           0.000
##     call_23           0.000
##     call_24           0.000
##     call_25           0.000
##     call_26           0.000
##     call_27           0.000
##     call_28           0.000
##     call_29           0.000
##     call_30           0.000
##     call_31           0.000
##     call_32           0.000
##     i                 7.402    0.417   17.760    0.000
##     s                -0.261    0.038   -6.913    0.000
##     q                 0.005    0.001    5.015    0.000
## 
## Variances:
##     call_01          16.731    3.107
##     call_02           9.807    1.915
##     call_03          11.202    2.087
##     call_04           9.089    1.700
##     call_05          17.265    3.023
##     call_06          15.249    2.667
##     call_07          18.412    3.178
##     call_08          15.511    2.684
##     call_09          16.776    2.886
##     call_10          10.061    1.764
##     call_11          12.751    2.208
##     call_12          14.664    2.524
##     call_13          12.104    2.097
##     call_14          13.257    2.289
##     call_15          11.621    2.017
##     call_16           7.045    1.257
##     call_17          10.488    1.828
##     call_18          11.085    1.927
##     call_19           9.291    1.628
##     call_20           8.794    1.545
##     call_21          11.203    1.945
##     call_22          11.655    2.020
##     call_23           7.416    1.316
##     call_24           7.003    1.249
##     call_25          10.300    1.800
##     call_26          13.740    2.377
##     call_27          13.083    2.275
##     call_28           8.501    1.526
##     call_29           8.287    1.507
##     call_30           8.291    1.533
##     call_31           7.677    1.469
##     call_32          10.034    1.899
##     i                 9.498    2.095
##     s                 0.042    0.017
##     q                 0.000    0.000

Starting with this growth model, we could now add in predictors and outcomes for the latent i, s, and q variables.

April 21, 2015

SEM in R: Adding a time index and recoding the Bahevioral Scores

#' First, reading in the data:
#' 
library(foreign)
setwd('C:/Users/flournoy/Downloads')

pdr2<-read.spss("PDR Wave 2.sav", to.data.frame=T)
pdr4<-read.spss("PDR Wave 4.sav", to.data.frame=T)
head(pdr2)
summary(pdr2)

#' ## Generate a time variable
#' 
#' This indexes each call for each family. This time variable
#' will be important later for widening the data for SEM.
#' 
library(dplyr)

pdr2_time <- pdr2 %>%
group_by(FAMILY) %>% #do the count by family
 arrange(YEAR,MONTH,DAY) %>% #sort by date
 mutate(callindex=1:n()) #create call index that’s 1:end for each family

head(pdr2_time)

#' ## Create composite score of kid bex
#' 
#' Scores Found in (cols 8:47).

library(car)
#?recode
## recode all of the kid behavior items into numeric variables in one go
# ignore difference between “occurred, stressed” and “occurred, not stressed”
# (if it occurred at all, it gets a 1; if it didn’t occur, it’s a 0)
# the defaults for as.XX.result are both TRUE; if you leave it that way,
# it will return character variables.

pdr2_time[,8:47]<-sapply(pdr2_time[,8:47],
 function(x)
 {x<-recode(x,"'DID NOT OCCUR'='0'; else = '1'",
 as.factor.result=F, as.numeric.result=T)})

head(pdr2_time)
str(pdr2_time)

#' Now we can create the composite (sum) variable
#' 
pdr2_time$bextot<-rowSums(pdr2_time[,8:47],na.rm=F)

summary(pdr2_time$bextot)

Tagged: sem