January 20, 2015

DIY functions

See also here.
You can write and save your own functions in R, which is very handy for automating a series of commands you do often. It can also make your code much more transparent, which is great news for anyone trying to understand your scripts (including Future You). Here’s how it works:

Give your new function a name.
Define the arguments.
Spell out the code you want R to run each time you call your function.
Tell it what output you want from it.

Here’s the rough structure to follow:

myfunction = function(arg1, arg2, ... ){
doing...
cool...
stuff...
return(output)
}

Example time!

Write a function that can take a vector of numbers as input, and return the mean of the numbers as output.

GetMean <- function(vector){
result <- mean(vector, na.rm=TRUE)
return(result)
}

What happens if you run that code? Not much, on the surface. R saves that new function for you, though, so later you can call it and provide the necessary argument(s). If you’re using RStudio, you’ll notice your brand new function shows up in the environment window.

Run the code above, and then try this:

GetMean(vector=1:10)

## [1] 5.5

GetMean(vector=rep(6,30))

## [1] 6

GetMean(c(1,3,2.5,NA))

## [1] 2.167

GetMean(iris$Petal.Length) # note that iris is one of the datasets that's built into R.

## [1] 3.758

How might you want to use this?

If you find yourself writing the same set of commands over and over, consider putting it into a function. For example, maybe you are doing a series of transformations and you want to generate a histogram after each step, but you’re a data artiste and you refuse to compromise on aethestics – you can use a function to save all the relevant plotting code in one place and then just call it every time you want to use it.

library(ggplot2)

# Use the iris data as an example (built into R)
str(iris)

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

# Define the function to generate a histogram with all of the settings I like
PlotHist <- function(var, var.name, fig.num, transformation){
  
  data <- data.frame(var.name=var) # convert variable vector to data frame for ggplot
  
  p <- ggplot(data, aes(x=var.name)) +
    geom_histogram(aes(y=..density..), colour="black", fill="white") +
    geom_density(alpha=.2, fill="red") +  # Overlay with transparent density plot
    xlab(var.name) +
    ylab(NULL) +
    ggtitle(paste("Figure ", fig.num, ": ", var.name, " (", transformation, ")", sep=""))  
return(p)
}

# Plot raw Petal.Length data
PlotHist(var=iris$Petal.Length, var.name="Petal Length", fig.num=1, transformation="raw")

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-4

# Run transformations on Petal.Length and get plot after each one
iris$PL.sqrt <- sqrt(iris$Petal.Length)
PlotHist(iris$PL.sqrt, "Petal Length", 2, "square root")

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-4

iris$PL.negrec <- -1/iris$Petal.Length
PlotHist(iris$PL.negrec, "Petal Length", 3, "negative reciprocal")

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-4

# How cool?? So cool.
# Also, how much easier is this to read than if I had copy/pasted my ggplot code three times? So much easier.