# DIY functions

You can write and save your own functions in R, which is very handy for automating a series of commands you do often. It can also make your code much more transparent, which is great news for anyone trying to understand your scripts (including Future You). Hereâ€™s how it works:

1. Give your new function a name.
2. Define the arguments.
3. Spell out the code you want R to run each time you call your function.
4. Tell it what output you want from it.

Hereâ€™s the rough structure to follow:

myfunction = function(arg1, arg2, ... ){
doing...
cool...
stuff...
return(output)
}

### Example time!

Write a function that can take a vector of numbers as input, and return the mean of the numbers as output.

GetMean <- function(vector){
result <- mean(vector, na.rm=TRUE)
return(result)
}

What happens if you run that code? Not much, on the surface. R saves that new function for you, though, so later you can call it and provide the necessary argument(s). If youâ€™re using RStudio, youâ€™ll notice your brand new function shows up in the environment window.

Run the code above, and then try this:

GetMean(vector=1:10)
## [1] 5.5
GetMean(vector=rep(6,30))
## [1] 6
GetMean(c(1,3,2.5,NA))
## [1] 2.167
GetMean(iris$Petal.Length) # note that iris is one of the datasets that's built into R. ## [1] 3.758 ### How might you want to use this? If you find yourself writing the same set of commands over and over, consider putting it into a function. For example, maybe you are doing a series of transformations and you want to generate a histogram after each step, but youâ€™re a data artiste and you refuse to compromise on aethestics â€“ you can use a function to save all the relevant plotting code in one place and then just call it every time you want to use it. library(ggplot2) # Use the iris data as an example (built into R) str(iris) ## 'data.frame': 150 obs. of 5 variables: ##$ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ##$ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ##$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# Define the function to generate a histogram with all of the settings I like
PlotHist <- function(var, var.name, fig.num, transformation){

data <- data.frame(var.name=var) # convert variable vector to data frame for ggplot

p <- ggplot(data, aes(x=var.name)) +
geom_histogram(aes(y=..density..), colour="black", fill="white") +
geom_density(alpha=.2, fill="red") +  # Overlay with transparent density plot
xlab(var.name) +
ylab(NULL) +
ggtitle(paste("Figure ", fig.num, ": ", var.name, " (", transformation, ")", sep=""))
return(p)
}

# Plot raw Petal.Length data
PlotHist(var=iris$Petal.Length, var.name="Petal Length", fig.num=1, transformation="raw") ## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this. # Run transformations on Petal.Length and get plot after each one iris$PL.sqrt <- sqrt(iris$Petal.Length) PlotHist(iris$PL.sqrt, "Petal Length", 2, "square root")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

iris$PL.negrec <- -1/iris$Petal.Length
PlotHist(iris\$PL.negrec, "Petal Length", 3, "negative reciprocal")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

# How cool?? So cool.
# Also, how much easier is this to read than if I had copy/pasted my ggplot code three times? So much easier.