reformatting data to wide

Here’s some thing we were working on in R Club today.

This example shows how to reformat data from (sort of) long to wide. We’ve got pre and post measurements for each subject, some subject-level variables (called dyad vars in the example), and two variables that vary by pre and post. These are the outcome of interest (DV) and a covariate (the length of the video). The data is “sort of” long in that pre and post are each on their own line, but the type of measure (DV or video length) is still each in its own column. To make it fully wide, where there’s only one line for each subject, first we need to make it fully long, and then spread it all out. I’m using dplyr and tidyr here since they’re such lovely tools, but of course there’s more than one way to go about this.

Enjoy!









Here’s the whole thing:

data <- data.frame(subj=sort(c(1:10, 1:10)), cond=rep(c("pre", "post"), 10), DV=rnorm(20), video=rnorm(20), dyadvar1=sort(c(1:10, 1:10)), dyadvar2=sort(c(1:10, 1:10)))

data
##    subj cond          DV      video dyadvar1 dyadvar2
## 1     1  pre -0.09424282 -0.5553683        1        1
## 2     1 post  1.50787647  0.1086909        1        1
## 3     2  pre  1.63471884 -0.5022278        2        2
## 4     2 post  0.06910946 -0.1067023        2        2
## 5     3  pre  0.60311557 -0.3074021        3        3
## 6     3 post  1.34918285  1.0210010        3        3
## 7     4  pre -0.52288040  0.3636248        4        4
## 8     4 post -1.14408452  0.9635112        4        4
## 9     5  pre -1.67596739 -0.5665085        5        5
## 10    5 post  0.45287833  0.1465894        5        5
## 11    6  pre  0.96739333 -0.2250043        6        6
## 12    6 post  0.20794296  1.0133354        6        6
## 13    7  pre  1.25381345 -1.0679324        7        7
## 14    7 post  0.73957481 -1.3426321        7        7
## 15    8  pre -1.26477564 -0.4130490        8        8
## 16    8 post -1.80489529 -0.5602744        8        8
## 17    9  pre -0.85829518  0.8735288        9        9
## 18    9 post  0.52721178  0.1642551        9        9
## 19   10  pre  0.08717665 -0.3085623       10       10
## 20   10 post -1.28951524 -1.7575259       10       10
library(tidyr); library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
wide.data <- data %>%
  gather(key="measure", value="value", DV:video) %>%
  unite(col=key, cond, measure)  %>%
  spread(key=key, value=value)

Here’s each step:

library(knitr)
data %>%
  gather(key="measure", value="value", DV:video)
##    subj cond dyadvar1 dyadvar2 measure       value
## 1     1  pre        1        1      DV -0.09424282
## 2     1 post        1        1      DV  1.50787647
## 3     2  pre        2        2      DV  1.63471884
## 4     2 post        2        2      DV  0.06910946
## 5     3  pre        3        3      DV  0.60311557
## 6     3 post        3        3      DV  1.34918285
## 7     4  pre        4        4      DV -0.52288040
## 8     4 post        4        4      DV -1.14408452
## 9     5  pre        5        5      DV -1.67596739
## 10    5 post        5        5      DV  0.45287833
## 11    6  pre        6        6      DV  0.96739333
## 12    6 post        6        6      DV  0.20794296
## 13    7  pre        7        7      DV  1.25381345
## 14    7 post        7        7      DV  0.73957481
## 15    8  pre        8        8      DV -1.26477564
## 16    8 post        8        8      DV -1.80489529
## 17    9  pre        9        9      DV -0.85829518
## 18    9 post        9        9      DV  0.52721178
## 19   10  pre       10       10      DV  0.08717665
## 20   10 post       10       10      DV -1.28951524
## 21    1  pre        1        1   video -0.55536830
## 22    1 post        1        1   video  0.10869087
## 23    2  pre        2        2   video -0.50222782
## 24    2 post        2        2   video -0.10670226
## 25    3  pre        3        3   video -0.30740210
## 26    3 post        3        3   video  1.02100096
## 27    4  pre        4        4   video  0.36362482
## 28    4 post        4        4   video  0.96351116
## 29    5  pre        5        5   video -0.56650848
## 30    5 post        5        5   video  0.14658942
## 31    6  pre        6        6   video -0.22500427
## 32    6 post        6        6   video  1.01333537
## 33    7  pre        7        7   video -1.06793244
## 34    7 post        7        7   video -1.34263213
## 35    8  pre        8        8   video -0.41304900
## 36    8 post        8        8   video -0.56027441
## 37    9  pre        9        9   video  0.87352882
## 38    9 post        9        9   video  0.16425512
## 39   10  pre       10       10   video -0.30856231
## 40   10 post       10       10   video -1.75752594
data %>%
  gather(key="measure", value="value", DV:video) %>%
  unite(col=key, cond, measure)
##    subj        key dyadvar1 dyadvar2       value
## 1     1     pre_DV        1        1 -0.09424282
## 2     1    post_DV        1        1  1.50787647
## 3     2     pre_DV        2        2  1.63471884
## 4     2    post_DV        2        2  0.06910946
## 5     3     pre_DV        3        3  0.60311557
## 6     3    post_DV        3        3  1.34918285
## 7     4     pre_DV        4        4 -0.52288040
## 8     4    post_DV        4        4 -1.14408452
## 9     5     pre_DV        5        5 -1.67596739
## 10    5    post_DV        5        5  0.45287833
## 11    6     pre_DV        6        6  0.96739333
## 12    6    post_DV        6        6  0.20794296
## 13    7     pre_DV        7        7  1.25381345
## 14    7    post_DV        7        7  0.73957481
## 15    8     pre_DV        8        8 -1.26477564
## 16    8    post_DV        8        8 -1.80489529
## 17    9     pre_DV        9        9 -0.85829518
## 18    9    post_DV        9        9  0.52721178
## 19   10     pre_DV       10       10  0.08717665
## 20   10    post_DV       10       10 -1.28951524
## 21    1  pre_video        1        1 -0.55536830
## 22    1 post_video        1        1  0.10869087
## 23    2  pre_video        2        2 -0.50222782
## 24    2 post_video        2        2 -0.10670226
## 25    3  pre_video        3        3 -0.30740210
## 26    3 post_video        3        3  1.02100096
## 27    4  pre_video        4        4  0.36362482
## 28    4 post_video        4        4  0.96351116
## 29    5  pre_video        5        5 -0.56650848
## 30    5 post_video        5        5  0.14658942
## 31    6  pre_video        6        6 -0.22500427
## 32    6 post_video        6        6  1.01333537
## 33    7  pre_video        7        7 -1.06793244
## 34    7 post_video        7        7 -1.34263213
## 35    8  pre_video        8        8 -0.41304900
## 36    8 post_video        8        8 -0.56027441
## 37    9  pre_video        9        9  0.87352882
## 38    9 post_video        9        9  0.16425512
## 39   10  pre_video       10       10 -0.30856231
## 40   10 post_video       10       10 -1.75752594
data %>%
  gather(key="measure", value="value", DV:video) %>%
  unite(col=key, cond, measure)  %>%
  spread(key=key, value=value)
##    subj dyadvar1 dyadvar2     post_DV post_video      pre_DV  pre_video
## 1     1        1        1  1.50787647  0.1086909 -0.09424282 -0.5553683
## 2     2        2        2  0.06910946 -0.1067023  1.63471884 -0.5022278
## 3     3        3        3  1.34918285  1.0210010  0.60311557 -0.3074021
## 4     4        4        4 -1.14408452  0.9635112 -0.52288040  0.3636248
## 5     5        5        5  0.45287833  0.1465894 -1.67596739 -0.5665085
## 6     6        6        6  0.20794296  1.0133354  0.96739333 -0.2250043
## 7     7        7        7  0.73957481 -1.3426321  1.25381345 -1.0679324
## 8     8        8        8 -1.80489529 -0.5602744 -1.26477564 -0.4130490
## 9     9        9        9  0.52721178  0.1642551 -0.85829518  0.8735288
## 10   10       10       10 -1.28951524 -1.7575259  0.08717665 -0.3085623



Comments are closed.