wide to long: what we learned

As the French say, there are many methods to pluck a duck, and this duck is particularly feathery, so this is by no means the only reasonable approach. That said, the solution we ended up going with uses tidyr, and I think it’s a pretty nice one (note that you do need a recent version of R to use it, though).

Note that we used the dummy dataset Melissa provided (see the original post), so the column names are all correct, but the data are just random numbers (that’s why, for example, there are negative subject IDs).

A couple nifty little tidbits:

  1. The sep argument in the reshape() function from the stats package does not work the way you might expect it to! Here is an rpubs post demonstrating the issue.
  2. Also, reshape() chokes on the missing columns in wide format, as I mentioned in the original post, but tidyr handles them just fine, actually (it just assumes you want NAs filled in there). Nice little package. 🙂

Here’s the code!

setwd("~/Downloads")
data <- read.csv("ML Data for R.csv", row.names=1)


# standardize naming convention for variables
colnames(data) <- sub(pattern=".T", replacement="_T", x=colnames(data), fixed=TRUE)

# note that reshape() does not work as expected!! The sep argument in reshape doesn't do what it appears to do.

library(tidyr); library(dplyr) 
# I mostly stole this code from John Flournoy :)
data.long <- data %>% 
  # We gather all the columns except for ADS.id into a single column 
  # of variable names and another column of values.
  gather(variable,value,-ADS.id) %>%
  # We now can separate the variable column by splitting each variable name
  # at the '_T' -- now we have two columns, one with the variable name and
  # one with the timepoint
  separate(variable,c('variable','timepoint'),sep='_T') %>%
  # This step spreads it out so that each variable gets a column with the
  # associated values
  spread(variable,value)

# snapshots of the process...
head(gather(data, variable,value,-ADS.id)) 
##      ADS.id   variable      value
## 1 1.0018516 age.mri_T1 -0.3092724
## 2 1.1181216 age.mri_T1 -0.6904551
## 3 1.1211195 age.mri_T1  0.7601895
## 4 1.0241864 age.mri_T1 -1.0652325
## 5 0.2439787 age.mri_T1  2.1723328
## 6 1.5826904 age.mri_T1  1.0067890
head(separate(gather(data, variable,value,-ADS.id), variable,c('variable','timepoint'),sep='_T'))
##      ADS.id variable timepoint      value
## 1 1.0018516  age.mri         1 -0.3092724
## 2 1.1181216  age.mri         1 -0.6904551
## 3 1.1211195  age.mri         1  0.7601895
## 4 1.0241864  age.mri         1 -1.0652325
## 5 0.2439787  age.mri         1  2.1723328
## 6 1.5826904  age.mri         1  1.0067890
head(data.long) # the result!
##      ADS.id timepoint        age    age.mri   ccesdtot    ccesdtot2
## 1 -3.188239         1         NA  0.2349363 -0.3167452 -0.502415557
## 2 -3.188239         2         NA         NA -1.5850726 -0.004856296
## 3 -3.188239         3 -0.6822444         NA -0.8233850  0.630903711
## 4 -3.188239         4         NA         NA  0.8308063  0.047788788
## 5 -2.384302         1         NA -0.4050702 -0.1712919  1.599556240
## 6 -2.384302         2         NA         NA  0.5175698 -0.888349761
##   D.15.age.months errors.c.cong errors.c.cong.PERC errors.c.inc
## 1              NA     1.1203254         -0.2384952   -1.9943953
## 2              NA            NA                 NA           NA
## 3       -1.294505     0.8450592         -0.2829815    1.1129147
## 4              NA     0.5954789         -0.8289942    0.2315604
## 5              NA    -0.3532590          0.9763195    0.7725308
## 6              NA            NA                 NA           NA
##   errors.c.inc.PERC Errors.congb Errors.congt errors.i.cong
## 1         0.2549887    1.5140582    0.9740986     0.1182785
## 2                NA           NA           NA            NA
## 3         0.3035785    0.6272806   -0.4459871    -0.4603392
## 4         0.4983972   -0.2078286    0.9844005     1.4174055
## 5         0.1856251   -1.7249345    0.0948151    -0.3314606
## 6                NA           NA           NA            NA
##   errors.i.cong.PERC errors.i.inc errors.i.inc.PERC Errors.incb
## 1          0.1808000    1.6272805        -1.5131672  -0.8726752
## 2                 NA           NA                NA          NA
## 3          1.6620035   -0.5556201        -0.3994536   0.5846239
## 4          0.2401470   -0.1398306         1.4187102  -0.9474623
## 5         -0.1865299   -0.7414438         0.2517365   0.4851024
## 6                 NA           NA                NA          NA
##   Errors.inct errors.prac errors.slow.c.cong errors.slow.c.inc
## 1   -1.062035   1.2304198          0.2057605          0.453230
## 2          NA          NA                 NA                NA
## 3   -2.079247   0.2014654         -1.1325324          1.384324
## 4   -0.114631  -0.3909164          0.8461512          2.109982
## 5   -1.908316  -1.2991850          0.2152569          1.251143
## 6          NA          NA                 NA                NA
##   errors.slow.i.cong errors.slow.i.inc errors.slow.prac Errors.total
## 1         -0.5650468      -0.260194217      -1.05898000    1.1322255
## 2                 NA                NA               NA           NA
## 3         -1.5816222       0.005833924       1.05445791    1.3550896
## 4         -1.0825809       0.107324351       0.08879325    0.2439465
## 5          1.1278321      -0.937676510      -0.43545535   -0.3518373
## 6                 NA                NA               NA           NA
##   errors.total.c.cong errors.total.c.inc errors.total.i.cong
## 1           0.9362488         -0.2737423           0.1255926
## 2                  NA                 NA                  NA
## 3          -1.6188499         -0.6009969           0.8908975
## 4          -0.1001239         -0.4400096          -1.5393167
## 5          -0.7427545          1.2649423          -1.2623598
## 6                  NA                 NA                  NA
##   errors.total.i.inc errors.total.PERC errors.total.prac Int.RT.All
## 1         -0.2323253         0.5850435         1.3739468 -1.6736488
## 2                 NA                NA                NA         NA
## 3          0.2039495         1.5558254        -1.3495371 -1.2342580
## 4          1.8545902        -0.8207177        -0.1254748 -0.5403726
## 5          0.9093587        -1.8192946         0.9457083 -1.1837205
## 6                 NA                NA                NA         NA
##   Int.RT.ProB Int.RT.ReaB     RT.avg  rt.c.cong   rt.c.inc    RT.conb
## 1  -0.4787541   0.4969283 -2.3207763  0.1943951 -1.4280047  0.1893392
## 2          NA          NA         NA         NA         NA         NA
## 3  -1.6069571   0.2826211 -0.8354791 -0.4977420 -1.3454648         NA
## 4   0.9181784   0.5912347 -0.7084950  0.4040288  1.9530110         NA
## 5  -2.0270125   0.3067939 -0.1157335 -0.9217092  0.7747549 -1.1509858
## 6          NA          NA         NA         NA         NA         NA
##    RT.congb   RT.congt  rt.i.cong   rt.i.inc    RT.incb    RT.inct
## 1        NA  0.5066862 -0.8200358  0.1091896 -0.1578207 -0.2115042
## 2        NA         NA         NA         NA         NA         NA
## 3 -1.295408  0.2670604 -0.6976511 -0.8435179 -0.9275700  0.8909898
## 4 -1.026102 -0.7030719  1.0139764  1.1513498 -0.9378490 -0.5800233
## 5        NA -0.1882044  0.2780732  0.3299927  1.1015153 -0.5091075
## 6        NA         NA         NA         NA         NA         NA
##        rt.prac
## 1 -1.177762051
## 2           NA
## 3 -0.698241844
## 4 -0.354985516
## 5  0.009850542
## 6           NA
# tidying, checking...
data.long$timepoint <- as.factor(data.long$timepoint)
data.long$ADS.id <- as.factor(data.long$ADS.id)
summary(data.long)
##                ADS.id    timepoint      age             age.mri       
##  -3.18823887209027:  4   1:246     Min.   :-2.8179   Min.   :-2.2391  
##  -2.38430177492141:  4   2:246     1st Qu.:-0.8310   1st Qu.:-0.6778  
##  -2.19765389063435:  4   3:246     Median :-0.0078   Median : 0.0045  
##  -1.93658469426135:  4   4:246     Mean   :-0.0353   Mean   : 0.0473  
##  -1.88871398596907:  4             3rd Qu.: 0.6482   3rd Qu.: 0.7528  
##  -1.87775103867895:  4             Max.   : 2.8244   Max.   : 3.9297  
##  (Other)          :960             NA's   :738       NA's   :738      
##     ccesdtot          ccesdtot2         D.15.age.months  
##  Min.   :-2.73145   Min.   :-3.165353   Min.   :-2.7352  
##  1st Qu.:-0.73198   1st Qu.:-0.627016   1st Qu.:-0.6356  
##  Median :-0.06160   Median :-0.007192   Median :-0.0218  
##  Mean   :-0.04439   Mean   : 0.023682   Mean   : 0.0226  
##  3rd Qu.: 0.60229   3rd Qu.: 0.692493   3rd Qu.: 0.7302  
##  Max.   : 2.87977   Max.   : 2.831624   Max.   : 2.5312  
##                                         NA's   :738      
##  errors.c.cong      errors.c.cong.PERC  errors.c.inc     
##  Min.   :-3.02694   Min.   :-3.02622   Min.   :-3.07194  
##  1st Qu.:-0.65846   1st Qu.:-0.69937   1st Qu.:-0.65187  
##  Median : 0.03656   Median :-0.09637   Median : 0.01856  
##  Mean   :-0.00862   Mean   :-0.01181   Mean   : 0.00975  
##  3rd Qu.: 0.64811   3rd Qu.: 0.63642   3rd Qu.: 0.64352  
##  Max.   : 3.48170   Max.   : 3.13567   Max.   : 3.27508  
##  NA's   :246        NA's   :246        NA's   :246       
##  errors.c.inc.PERC   Errors.congb       Errors.congt     
##  Min.   :-3.83647   Min.   :-3.61005   Min.   :-4.00876  
##  1st Qu.:-0.65952   1st Qu.:-0.67655   1st Qu.:-0.70759  
##  Median :-0.00111   Median :-0.05863   Median :-0.01951  
##  Mean   :-0.00783   Mean   :-0.02327   Mean   :-0.00887  
##  3rd Qu.: 0.59038   3rd Qu.: 0.63483   3rd Qu.: 0.66770  
##  Max.   : 2.95515   Max.   : 3.22904   Max.   : 3.20207  
##  NA's   :246        NA's   :246        NA's   :246       
##  errors.i.cong      errors.i.cong.PERC  errors.i.inc     
##  Min.   :-3.21592   Min.   :-3.28548   Min.   :-3.16182  
##  1st Qu.:-0.71059   1st Qu.:-0.65964   1st Qu.:-0.71632  
##  Median :-0.06600   Median :-0.06822   Median : 0.00092  
##  Mean   :-0.04063   Mean   : 0.00061   Mean   :-0.03424  
##  3rd Qu.: 0.59837   3rd Qu.: 0.64682   3rd Qu.: 0.64873  
##  Max.   : 4.63775   Max.   : 3.26292   Max.   : 2.61837  
##  NA's   :246        NA's   :246        NA's   :246       
##  errors.i.inc.PERC   Errors.incb        Errors.inct      
##  Min.   :-3.18711   Min.   :-2.85501   Min.   :-2.85304  
##  1st Qu.:-0.71387   1st Qu.:-0.60294   1st Qu.:-0.66899  
##  Median :-0.00953   Median : 0.00232   Median :-0.02129  
##  Mean   : 0.00542   Mean   : 0.02473   Mean   : 0.00316  
##  3rd Qu.: 0.71589   3rd Qu.: 0.66226   3rd Qu.: 0.71004  
##  Max.   : 3.72285   Max.   : 3.05961   Max.   : 3.36236  
##  NA's   :246        NA's   :246        NA's   :246       
##   errors.prac       errors.slow.c.cong errors.slow.c.inc 
##  Min.   :-2.65139   Min.   :-2.87464   Min.   :-2.71655  
##  1st Qu.:-0.67979   1st Qu.:-0.68237   1st Qu.:-0.68067  
##  Median : 0.00760   Median : 0.00764   Median : 0.02927  
##  Mean   :-0.02728   Mean   :-0.00233   Mean   : 0.03072  
##  3rd Qu.: 0.66139   3rd Qu.: 0.67304   3rd Qu.: 0.73078  
##  Max.   : 2.64629   Max.   : 3.34258   Max.   : 2.96471  
##  NA's   :246        NA's   :246        NA's   :246       
##  errors.slow.i.cong errors.slow.i.inc  errors.slow.prac  
##  Min.   :-2.97736   Min.   :-3.48673   Min.   :-3.38586  
##  1st Qu.:-0.67957   1st Qu.:-0.64765   1st Qu.:-0.77620  
##  Median : 0.09023   Median :-0.00268   Median :-0.07298  
##  Mean   : 0.05189   Mean   :-0.02303   Mean   :-0.04885  
##  3rd Qu.: 0.70666   3rd Qu.: 0.62909   3rd Qu.: 0.66078  
##  Max.   : 2.64540   Max.   : 2.88500   Max.   : 2.63078  
##  NA's   :246        NA's   :246        NA's   :246       
##   Errors.total      errors.total.c.cong errors.total.c.inc
##  Min.   :-3.02983   Min.   :-3.94584    Min.   :-3.04545  
##  1st Qu.:-0.66766   1st Qu.:-0.71108    1st Qu.:-0.68723  
##  Median : 0.03033   Median :-0.01434    Median :-0.04592  
##  Mean   : 0.00020   Mean   :-0.03496    Mean   :-0.00674  
##  3rd Qu.: 0.67346   3rd Qu.: 0.68086    3rd Qu.: 0.68570  
##  Max.   : 2.83657   Max.   : 2.96111    Max.   : 3.49591  
##  NA's   :246        NA's   :246         NA's   :246       
##  errors.total.i.cong errors.total.i.inc errors.total.PERC 
##  Min.   :-3.35570    Min.   :-3.06428   Min.   :-3.52159  
##  1st Qu.:-0.69939    1st Qu.:-0.75856   1st Qu.:-0.68296  
##  Median :-0.01178    Median :-0.04273   Median :-0.06017  
##  Mean   :-0.02606    Mean   :-0.06233   Mean   :-0.06873  
##  3rd Qu.: 0.68439    3rd Qu.: 0.60996   3rd Qu.: 0.54828  
##  Max.   : 3.05471    Max.   : 2.70558   Max.   : 3.38433  
##  NA's   :246         NA's   :246        NA's   :246       
##  errors.total.prac    Int.RT.All        Int.RT.ProB      
##  Min.   :-2.78635   Min.   :-2.70803   Min.   :-2.93328  
##  1st Qu.:-0.75211   1st Qu.:-0.64684   1st Qu.:-0.64654  
##  Median :-0.11411   Median :-0.01643   Median : 0.06557  
##  Mean   :-0.05913   Mean   :-0.00633   Mean   : 0.02768  
##  3rd Qu.: 0.57326   3rd Qu.: 0.63270   3rd Qu.: 0.68377  
##  Max.   : 3.28847   Max.   : 3.13618   Max.   : 2.98533  
##  NA's   :246        NA's   :246        NA's   :246       
##   Int.RT.ReaB           RT.avg           rt.c.cong       
##  Min.   :-3.11339   Min.   :-2.61333   Min.   :-2.90894  
##  1st Qu.:-0.65444   1st Qu.:-0.64842   1st Qu.:-0.69766  
##  Median : 0.01216   Median :-0.00995   Median :-0.04453  
##  Mean   : 0.03416   Mean   :-0.00645   Mean   :-0.03129  
##  3rd Qu.: 0.72491   3rd Qu.: 0.60780   3rd Qu.: 0.56930  
##  Max.   : 3.30060   Max.   : 3.69951   Max.   : 3.52680  
##  NA's   :246        NA's   :246        NA's   :246       
##     rt.c.inc           RT.conb           RT.congb          RT.congt       
##  Min.   :-3.41278   Min.   :-3.1444   Min.   :-3.7782   Min.   :-3.24379  
##  1st Qu.:-0.67249   1st Qu.:-0.7331   1st Qu.:-0.7437   1st Qu.:-0.64294  
##  Median : 0.04412   Median :-0.0374   Median :-0.0559   Median : 0.02415  
##  Mean   :-0.00284   Mean   :-0.0629   Mean   :-0.0678   Mean   : 0.03364  
##  3rd Qu.: 0.66931   3rd Qu.: 0.5235   3rd Qu.: 0.5887   3rd Qu.: 0.75205  
##  Max.   : 2.85384   Max.   : 2.8513   Max.   : 2.8365   Max.   : 3.49464  
##  NA's   :246        NA's   :738       NA's   :492       NA's   :246       
##    rt.i.cong          rt.i.inc           RT.incb        
##  Min.   :-3.1709   Min.   :-3.28774   Min.   :-3.14474  
##  1st Qu.:-0.6826   1st Qu.:-0.67278   1st Qu.:-0.73181  
##  Median : 0.0261   Median :-0.00448   Median :-0.01798  
##  Mean   : 0.0232   Mean   :-0.01430   Mean   :-0.02712  
##  3rd Qu.: 0.7156   3rd Qu.: 0.62549   3rd Qu.: 0.62710  
##  Max.   : 3.7440   Max.   : 3.14034   Max.   : 3.30890  
##  NA's   :246       NA's   :246        NA's   :246       
##     RT.inct            rt.prac        
##  Min.   :-3.82650   Min.   :-3.29600  
##  1st Qu.:-0.69212   1st Qu.:-0.72985  
##  Median :-0.01255   Median :-0.04557  
##  Mean   :-0.01289   Mean   :-0.04722  
##  3rd Qu.: 0.63413   3rd Qu.: 0.64025  
##  Max.   : 3.81651   Max.   : 2.45001  
##  NA's   :246        NA's   :246

Comments are closed.