Dealing with missing data via multiple imputation

Here’s a little teaser for one of tomorrow’s nuggets.

I’ll be talking about using multiple imputation as a remedy for missing data, using the Amelia package. To whet your appetite, check out this pithy post about how R handles missing values in general:

http://www.ats.ucla.edu/stat/r/faq/missing.htm

Multiple Imputation!

Here’s a link to a webinar on missing data (you need to register with your email address to get access to the videos): http://www.theanalysisfactor.com/webinars/recordings/downloads/#v5

Here’s a link to an Rpubs handout: http://rpubs.com/rosemm/33543

And here’s all the relevant code:

install.packages("Amelia")
library(Amelia)

data() # Amelia comes with some datasets
data(africa) # let's pull in the africa dataset
str(africa)
?africa
View(africa)
summary(africa)
summary(lm(africa$civlib ~ africa$trade)) # listwise deletion

?amelia
m <- 5 # the number of datasets to create (5 is typical) a.out <- amelia(x = africa, cs = "country", ts = "year", logs = "gdp_pc") # note that we're using all the variables, even though we won't use them all in the regression summary(a.out) plot(a.out) par(mfrow=c(1,1)) missmap(a.out) # run our regression on each dataset b.out<-NULL se.out<-NULL for(i in 1:m) { ols.out <- lm(civlib ~ trade ,data = a.out$imputations[[i]]) b.out <- rbind(b.out, ols.out$coef) se.out <- rbind(se.out, coef(summary(ols.out))[,2]) } # combine the results from all of the different regressions combined.results <- mi.meld(q = b.out, se = se.out) ?AmeliaView # Sounds fun, but it didn't work for me. Meh.

2 comments