Dealing with missing data via multiple imputation
Here’s a little teaser for one of tomorrow’s nuggets.
I’ll be talking about using multiple imputation as a remedy for missing data, using the Amelia package. To whet your appetite, check out this pithy post about how R handles missing values in general:
http://www.ats.ucla.edu/stat/r/faq/missing.htm
Multiple Imputation!
Here’s a link to a webinar on missing data (you need to register with your email address to get access to the videos): http://www.theanalysisfactor.com/webinars/recordings/downloads/#v5
Here’s a link to an Rpubs handout: http://rpubs.com/rosemm/33543
And here’s all the relevant code:
install.packages("Amelia")
library(Amelia)
data() # Amelia comes with some datasets
data(africa) # let's pull in the africa dataset
str(africa)
?africa
View(africa)
summary(africa)
summary(lm(africa$civlib ~ africa$trade)) # listwise deletion
?amelia
m <- 5 # the number of datasets to create (5 is typical)
a.out <- amelia(x = africa, cs = "country", ts = "year", logs = "gdp_pc") # note that we're using all the variables, even though we won't use them all in the regression
summary(a.out)
plot(a.out)
par(mfrow=c(1,1))
missmap(a.out)
# run our regression on each dataset
b.out<-NULL
se.out<-NULL
for(i in 1:m) {
ols.out <- lm(civlib ~ trade ,data = a.out$imputations[[i]])
b.out <- rbind(b.out, ols.out$coef)
se.out <- rbind(se.out, coef(summary(ols.out))[,2])
}
# combine the results from all of the different regressions
combined.results <- mi.meld(q = b.out, se = se.out)
?AmeliaView # Sounds fun, but it didn't work for me. Meh.
Does anyone have a good resource for the ‘Why/When do MI’ question? I’ve found this thing from PSU, but I’d like to hear from folks who’ve actually used this in their work.
also UCLA