Category: Reading and Code Discussion

Convergence!

Check out these pretty tools:
mcmcplot()
ggmcmc()

 
# Example time!
library(R2jags)
?gelman.diag
?geweke.diag
library(car)
data(Prestige)
df <- Prestige[,1:4]

# model 1 weak priors
modelString = "
model {
  for(i in 1:N){
    prestige[i] ~ dnorm(mu[i], tau)
    mu[i] <- beta0 + beta1 * income[i] + beta2 * education[i]
  }
  beta0 ~ dnorm(0, .01) 
  beta1 ~ dunif(-100, 100)
  beta2 ~ dunif(-100, 100)
  tau ~ dgamma(.01, .01) # note: JAGS uses shape and rate for gamma parameters
}
"
# plotting priors to play around with parameters
plot(density(rgamma(10000, shape=3, rate=2)), xlim=c(0,10))

# model 2 strong priors
modelString = "
model {
  for(i in 1:N){
    prestige[i] ~ dnorm(mu[i], tau)
    mu[i] <- beta0 + beta1 * income[i] + beta2 * education[i]
  }
  beta0 ~ dnorm(0, .01) 
  beta1 ~ dgamma(1, .1)
  beta2 ~ dgamma(1, .1)
  tau ~ dgamma(.01, .01)
}
"
writeLines( modelString , con="model.jags" )

prestige <- df$prestige
income <- df$income
education <- df$education
N <- length(df$prestige)

data <- list(prestige=prestige, income=income, education=education, N=N)
params <- c("beta0", "beta1", "beta2")
inits1 <- list("beta0"=0, "beta1"=0, "beta2"=0)
inits2 <- list("beta0"=-7, "beta1"=1, "beta2"=4) # sorta from the OLS estimates
inits3 <- list("beta0"=10, "beta1"=10, "beta2"=10)
inits <- list(inits1, inits2, inits3)

set.seed(123)

fit <- jags(data = data, inits = inits,
                   parameters.to.save = params, n.chains = 3, n.iter = 9000,
                   n.burnin = 1000, model.file = "model.jags")

print(fit)
summary(lm(prestige ~ income + education, data=Prestige)) # to compare

fit.mcmc <- as.mcmc(fit)

summary(fit.mcmc)
plot(fit)
traceplot(fit)

xyplot(fit.mcmc)
xyplot(fit.mcmc, aspect="fill")
densityplot(fit.mcmc)
densityplot(fit.mcmc, aspect="fill")
autocorr.plot(fit.mcmc)

gelman.plot(fit.mcmc)
geweke.diag(fit.mcmc)
raftery.diag(fit.mcmc)
heidel.diag(fit.mcmc)

# one-stop shopping:
library(superdiag)
superdiag(fit.mcmc, burnin = 100)

library(mcmcplots)
denplot(fit.mcmc)
denplot(fit.mcmc, parms = c("beta0", "beta1", "beta2"))
traplot(fit.mcmc, parms = c("beta0", "beta1", "beta2"))
traplot(fit.mcmc[1], parms = c("beta0", "beta1", "beta2"))
traplot(fit.mcmc[2], parms = c("beta0", "beta1", "beta2"))
traplot(fit.mcmc[3], parms = c("beta0", "beta1", "beta2"))
mcmcplot(fit.mcmc) # whoa!!!

caterplot(fit.mcmc)
caterplot(fit.mcmc, parms = c("beta0", "beta1", "beta2"),
          labels = c("Intercept", "income", "education"))

library(ggmcmc) # pretty :)
fit.gg <- ggs(fit.mcmc)
ggs_density(fit.gg)
ggs_traceplot(fit.gg)
ggmcmc(fit.gg) # makes gorgeous compilation of plots saved as ggmcmc-output.pdf
# Produces density plots of the whole chain (black) with the density plot of only the latest part of the chain (green). By default, it uses the latest 10% of the chain as a comparison.

Bayes by hand

You may have come across the phrase “conjugate prior” in your reading. This is an important concept historically, and in my opinion, demonstrating the phenomenon is useful in a very practical sense in terms of solidifying your understanding of how Bayes’ Theorem works.

The historical piece is that back in the day, before computers or whatever, the only way to do Bayesian analysis was if you had conjugate priors. That is to say, you needed your prior distribution and your likelihood distribution to match up magically to produce a workable posterior. This webpage (http://www.johndcook.com/conjugate_prior_diagram.html) gives a list of several such magical pairs, and provides the PMF (probability mass function) for each. The reason these pairs of distributions are special (“conjugate”) is because when you multiply them together, they will make a new distribution that also just happens to be theoretically definable. It is technically possible to multiply any two distributions together, of course, but a lot of the time it will result in a weird, unique snowflake of a distribution that isn’t definable in terms of a nice, neat PMF. If the posterior distribution you get is a well-behaved function that we have a PMF for, it’s very easy to work with – you can take any value from that function and easily calculate its probability, for example. If your posterior distribution is a unique snowflake, you have no formula (PMF) for getting probabilities from it.

Nowadays, we don’t do it by hand at all – JAGS brute forces a posterior distribution for us, regardless of how ugly and/or unique it might be. So it’s no longer necessary to stick to conjugate priors to do Bayesian analysis. This is still a nice problem to work through, though, since it makes you think about 1) what the likelihood function is, and 2) how exactly information from your data get combined with a prior distribution to result in your posterior. To see the example we walked through in club today, check out my homework assignment from the super awesome summer class I took on Bayesian Modeling at ICPSR. This document is my answers to our first homework assignment:
gamma-Poisson demonstation

desmos graph showing the relationship between the prior and posterior distributions in this example (thank you, John!): https://www.desmos.com/calculator/hzt65i2top

Homework! If you want additional practice (and why wouldn’t you, right?), show that the beta and binomial are another conjugate pair. See the new post on doing Bayes by hand for the answers.

A Quick Reading for Tomorrow

Tomorrow, for our first meeting of Bayes Club for the term, we’ll be going through some JAGS code, and talking more about MCMC algorithms. If you have a moment, take 5-10 minutes to look through a StackExchange post here. It’s a list of suggestions that people had for explaining MCMC to a beginner (which we all qualify as), and was really helpful to me. Some of the posts are clearer than others, so feel free to just skim over the page looking for things that make the most immediate sense.

See you tomorrow!

Plotting

Jason asked whether R can produce 3D plots, like the posterior distribution plots Kruschke uses in his book. The answer is ohmygod yes: http://alstatr.blogspot.com/2014/02/r-animating-2d-and-3d-plots.html 

Playing with priors code!

I wrote up some code (not all of it has been tested yet, though!) for possible implementations of some prior manipulations on the simple regression model we looked at this afternoon. Here is a link to the file: https://www.dropbox.com/s/3r5hlx78zrptqy3/playing%20with%20priors.r

It’s just the JAGS model portion  for each manipulation. If you have any code you came up with, please feel free to add it to this post as a comment!

Next Bayes Club: Getting our hands dirty with some R and JAGS

Our next Bayes Club meeting (this Tuesday) will be a close analysis of Kruschke’s BEST model, which is a model for comparing two groups to see if they’re drawn from populations with different means (i.e. something you would do instead of a t-test). You can download all of his code for this model (and many others) from his website:

http://www.indiana.edu/~kruschke/BEST/BEST.zip

He has written an entire article specifically focusing on this model including details about the code, analysis, etc. Definitely skim it before Tuesday if you get a chance:

http://www.indiana.edu/~kruschke/articles/Kruschke2013JEPG.pdf

Or, if you’re sick of reading things all day long, watch the video instead: http://youtu.be/fhw1j1Ru2i0

 And if you want to play around with a simulation, check this out: http://sumsar.net/best_online/

We’ll spend almost the entire meeting on Tuesday going through the graphical model in detail and going through all the R and JAGS code line by line. We hope that by the end of the day, you’ll be able to manipulate the model yourself (e.g. change the number of groups to do a one-way ANOVA instead of a t-test, etc.). If you don’t already have R and JAGS ready to go on your computer, please try to get them installed before we meet. And, of course, email me or Jacob with any questions.

Extra tidbit: We won’t go through this on Tuesday, but as you move forward, you may want to have a copy of the JAGS manual, which includes a list of all the distributions JAGS understands, along with the code for calling them (see Chapter 6):

http://people.math.aau.dk/~kkb/Undervisning/Bayes13/sorenh/docs/jags_user_manual.pdf

P.S. Sorry none of these links are clickable. WordPress is refusing to play nicely with others. When it’s sorted out, I’ll go back and update this post with real links.