reformatting data to wide
Here’s some thing we were working on in R Club today.
This example shows how to reformat data from (sort of) long to wide. We’ve got pre and post measurements for each subject, some subject-level variables (called dyad vars in the example), and two variables that vary by pre and post. These are the outcome of interest (DV) and a covariate (the length of the video). The data is “sort of” long in that pre and post are each on their own line, but the type of measure (DV or video length) is still each in its own column. To make it fully wide, where there’s only one line for each subject, first we need to make it fully long, and then spread it all out. I’m using dplyr and tidyr here since they’re such lovely tools, but of course there’s more than one way to go about this.
Enjoy!
Here’s the whole thing:
data <- data.frame(subj=sort(c(1:10, 1:10)), cond=rep(c("pre", "post"), 10), DV=rnorm(20), video=rnorm(20), dyadvar1=sort(c(1:10, 1:10)), dyadvar2=sort(c(1:10, 1:10)))
data
## subj cond DV video dyadvar1 dyadvar2
## 1 1 pre -0.09424282 -0.5553683 1 1
## 2 1 post 1.50787647 0.1086909 1 1
## 3 2 pre 1.63471884 -0.5022278 2 2
## 4 2 post 0.06910946 -0.1067023 2 2
## 5 3 pre 0.60311557 -0.3074021 3 3
## 6 3 post 1.34918285 1.0210010 3 3
## 7 4 pre -0.52288040 0.3636248 4 4
## 8 4 post -1.14408452 0.9635112 4 4
## 9 5 pre -1.67596739 -0.5665085 5 5
## 10 5 post 0.45287833 0.1465894 5 5
## 11 6 pre 0.96739333 -0.2250043 6 6
## 12 6 post 0.20794296 1.0133354 6 6
## 13 7 pre 1.25381345 -1.0679324 7 7
## 14 7 post 0.73957481 -1.3426321 7 7
## 15 8 pre -1.26477564 -0.4130490 8 8
## 16 8 post -1.80489529 -0.5602744 8 8
## 17 9 pre -0.85829518 0.8735288 9 9
## 18 9 post 0.52721178 0.1642551 9 9
## 19 10 pre 0.08717665 -0.3085623 10 10
## 20 10 post -1.28951524 -1.7575259 10 10
library(tidyr); library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
wide.data <- data %>%
gather(key="measure", value="value", DV:video) %>%
unite(col=key, cond, measure) %>%
spread(key=key, value=value)
Here’s each step:
library(knitr)
data %>%
gather(key="measure", value="value", DV:video)
## subj cond dyadvar1 dyadvar2 measure value
## 1 1 pre 1 1 DV -0.09424282
## 2 1 post 1 1 DV 1.50787647
## 3 2 pre 2 2 DV 1.63471884
## 4 2 post 2 2 DV 0.06910946
## 5 3 pre 3 3 DV 0.60311557
## 6 3 post 3 3 DV 1.34918285
## 7 4 pre 4 4 DV -0.52288040
## 8 4 post 4 4 DV -1.14408452
## 9 5 pre 5 5 DV -1.67596739
## 10 5 post 5 5 DV 0.45287833
## 11 6 pre 6 6 DV 0.96739333
## 12 6 post 6 6 DV 0.20794296
## 13 7 pre 7 7 DV 1.25381345
## 14 7 post 7 7 DV 0.73957481
## 15 8 pre 8 8 DV -1.26477564
## 16 8 post 8 8 DV -1.80489529
## 17 9 pre 9 9 DV -0.85829518
## 18 9 post 9 9 DV 0.52721178
## 19 10 pre 10 10 DV 0.08717665
## 20 10 post 10 10 DV -1.28951524
## 21 1 pre 1 1 video -0.55536830
## 22 1 post 1 1 video 0.10869087
## 23 2 pre 2 2 video -0.50222782
## 24 2 post 2 2 video -0.10670226
## 25 3 pre 3 3 video -0.30740210
## 26 3 post 3 3 video 1.02100096
## 27 4 pre 4 4 video 0.36362482
## 28 4 post 4 4 video 0.96351116
## 29 5 pre 5 5 video -0.56650848
## 30 5 post 5 5 video 0.14658942
## 31 6 pre 6 6 video -0.22500427
## 32 6 post 6 6 video 1.01333537
## 33 7 pre 7 7 video -1.06793244
## 34 7 post 7 7 video -1.34263213
## 35 8 pre 8 8 video -0.41304900
## 36 8 post 8 8 video -0.56027441
## 37 9 pre 9 9 video 0.87352882
## 38 9 post 9 9 video 0.16425512
## 39 10 pre 10 10 video -0.30856231
## 40 10 post 10 10 video -1.75752594
data %>%
gather(key="measure", value="value", DV:video) %>%
unite(col=key, cond, measure)
## subj key dyadvar1 dyadvar2 value
## 1 1 pre_DV 1 1 -0.09424282
## 2 1 post_DV 1 1 1.50787647
## 3 2 pre_DV 2 2 1.63471884
## 4 2 post_DV 2 2 0.06910946
## 5 3 pre_DV 3 3 0.60311557
## 6 3 post_DV 3 3 1.34918285
## 7 4 pre_DV 4 4 -0.52288040
## 8 4 post_DV 4 4 -1.14408452
## 9 5 pre_DV 5 5 -1.67596739
## 10 5 post_DV 5 5 0.45287833
## 11 6 pre_DV 6 6 0.96739333
## 12 6 post_DV 6 6 0.20794296
## 13 7 pre_DV 7 7 1.25381345
## 14 7 post_DV 7 7 0.73957481
## 15 8 pre_DV 8 8 -1.26477564
## 16 8 post_DV 8 8 -1.80489529
## 17 9 pre_DV 9 9 -0.85829518
## 18 9 post_DV 9 9 0.52721178
## 19 10 pre_DV 10 10 0.08717665
## 20 10 post_DV 10 10 -1.28951524
## 21 1 pre_video 1 1 -0.55536830
## 22 1 post_video 1 1 0.10869087
## 23 2 pre_video 2 2 -0.50222782
## 24 2 post_video 2 2 -0.10670226
## 25 3 pre_video 3 3 -0.30740210
## 26 3 post_video 3 3 1.02100096
## 27 4 pre_video 4 4 0.36362482
## 28 4 post_video 4 4 0.96351116
## 29 5 pre_video 5 5 -0.56650848
## 30 5 post_video 5 5 0.14658942
## 31 6 pre_video 6 6 -0.22500427
## 32 6 post_video 6 6 1.01333537
## 33 7 pre_video 7 7 -1.06793244
## 34 7 post_video 7 7 -1.34263213
## 35 8 pre_video 8 8 -0.41304900
## 36 8 post_video 8 8 -0.56027441
## 37 9 pre_video 9 9 0.87352882
## 38 9 post_video 9 9 0.16425512
## 39 10 pre_video 10 10 -0.30856231
## 40 10 post_video 10 10 -1.75752594
data %>%
gather(key="measure", value="value", DV:video) %>%
unite(col=key, cond, measure) %>%
spread(key=key, value=value)
## subj dyadvar1 dyadvar2 post_DV post_video pre_DV pre_video
## 1 1 1 1 1.50787647 0.1086909 -0.09424282 -0.5553683
## 2 2 2 2 0.06910946 -0.1067023 1.63471884 -0.5022278
## 3 3 3 3 1.34918285 1.0210010 0.60311557 -0.3074021
## 4 4 4 4 -1.14408452 0.9635112 -0.52288040 0.3636248
## 5 5 5 5 0.45287833 0.1465894 -1.67596739 -0.5665085
## 6 6 6 6 0.20794296 1.0133354 0.96739333 -0.2250043
## 7 7 7 7 0.73957481 -1.3426321 1.25381345 -1.0679324
## 8 8 8 8 -1.80489529 -0.5602744 -1.26477564 -0.4130490
## 9 9 9 9 0.52721178 0.1642551 -0.85829518 0.8735288
## 10 10 10 10 -1.28951524 -1.7575259 0.08717665 -0.3085623
// add bootstrap table styles to pandoc tables $(document).ready(function () { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); });