R-Related Resources for Documenting and Publishing Work
I’ve come across a few exciting resources in the past few days:
R + Markdown + Knitr
tl;dr: There’s an R plugin that allows writing up manuscripts in a version-controllable way and then dynamically generating graphs so that you don’t have to do it manually every time you change some little thing.
This part isn’t as much about learning R as about documenting work in R:
Carl Boettiger is a researcher at UC Santa Cruz who’s published what I think is a remarkable Open Lab Notebook. The concept of Open Lab Notebooks is interesting in itself — the notes that he takes on everything that he does in his lab, on a daily basis, are version-controlled and posted to his public website. He apparently is even working on a plugin that would embed his recent Twitter conversations and Mendeley readings at the top of every one of his notebook entries, making it clear what he was reading about and discussing at the time. Open Notebooks like this are a big part of the Open Science movement.
What’s especially cool about Boettiger’s notebook, though, is how he makes it work. He’s posted detailed write-ups about this here and here. As I understand, he currently writes up his daily notes in Markdown, which can be learned in about 15 minutes (it allows the writer to use a plain-text editor to write while still allowing headings, lists, bolding and italicizing, etc.). R code from his analyses can be embedded directly into the Markdown. Then, when he’s finished writing, he sends the file to an R plugin called Knitr. Knitr parses the Markdown and the R code, producing all of the output (including graphs) dynamically. He can then use a program called pandoc to create ready-formatted manuscripts as PDFs, HTML, Word documents, etc.
That sounds extremely exciting to me: I can imagine writing up a manuscript in a plain-text format, making it able to be properly version-controlled and not subject to proprietary file format changes over time, and having the analysis code embedded such that, if I decide to clean the data a different way, or to take out some outlier, no additional work would be necessary to re-generate all of the graphs and output and embed them in the manuscript. How cool is that?
The author of Knitr also has a post on integrating this workflow directly with WordPress, That post is here.
Free Book on Advanced R Programming
This week’s Hacker Newsletter contained a link to a new book called Advanced R development, which is itself currently under development. The author has posted the in-progress content of the book for free; it looks worth checking out!