End of term reminder

Just a friendly reminder, since we’re approaching the end of the term. If you’re taking Bayes Club for credit, you need to:

1) attend Bayes Club (we’ve been keeping track of attendance, so you’re covered there)

2) send a one-page-ish write-up to Sanjay (sanjay@uoregon.edu) by the end of the term, describing what you’ve done and/or what you’ve learned in Bayes Club this term.

If you are signed up for credit and you think you’ll have trouble meeting one or both of these requirements, email Sanjay.

From Child Development: A gentle intro to Bayesian stats

This article could be the ground floor for your Bayesian skyscraper: it’s clearly written, conceptually precise, and beautifully illustrated with empirical examples and practical guides (in the supplementary materials).

A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research

Rens van de Schoot, David Kaplan, Jaap Denissen, Jens B. Asendorpf, Franz J. Neyer, Marcel A.G. van Aken

Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlying Bayesian methods are introduced using a simplified example. Thereafter, the advantages and pitfalls of the specification of prior knowledge are discussed. To illustrate Bayesian methods explained in this study, in a second example a series of studies that examine the theoretical framework of dynamic interactionism are considered. In the Discussion the advantages and disadvantages of using Bayesian statistics are reviewed, and guidelines on how to report on Bayesian statistics are provided.

“…it is clear that it is not possible to think about learning from experience and acting on it without coming to terms with Bayes’ theorem. Jerome Cornfield” (in De Finetti, 1974a)

Convergence!

Check out these pretty tools:
mcmcplot()
ggmcmc()

 
# Example time!
library(R2jags)
?gelman.diag
?geweke.diag
library(car)
data(Prestige)
df <- Prestige[,1:4]

# model 1 weak priors
modelString = "
model {
  for(i in 1:N){
    prestige[i] ~ dnorm(mu[i], tau)
    mu[i] <- beta0 + beta1 * income[i] + beta2 * education[i]
  }
  beta0 ~ dnorm(0, .01) 
  beta1 ~ dunif(-100, 100)
  beta2 ~ dunif(-100, 100)
  tau ~ dgamma(.01, .01) # note: JAGS uses shape and rate for gamma parameters
}
"
# plotting priors to play around with parameters
plot(density(rgamma(10000, shape=3, rate=2)), xlim=c(0,10))

# model 2 strong priors
modelString = "
model {
  for(i in 1:N){
    prestige[i] ~ dnorm(mu[i], tau)
    mu[i] <- beta0 + beta1 * income[i] + beta2 * education[i]
  }
  beta0 ~ dnorm(0, .01) 
  beta1 ~ dgamma(1, .1)
  beta2 ~ dgamma(1, .1)
  tau ~ dgamma(.01, .01)
}
"
writeLines( modelString , con="model.jags" )

prestige <- df$prestige
income <- df$income
education <- df$education
N <- length(df$prestige)

data <- list(prestige=prestige, income=income, education=education, N=N)
params <- c("beta0", "beta1", "beta2")
inits1 <- list("beta0"=0, "beta1"=0, "beta2"=0)
inits2 <- list("beta0"=-7, "beta1"=1, "beta2"=4) # sorta from the OLS estimates
inits3 <- list("beta0"=10, "beta1"=10, "beta2"=10)
inits <- list(inits1, inits2, inits3)

set.seed(123)

fit <- jags(data = data, inits = inits,
                   parameters.to.save = params, n.chains = 3, n.iter = 9000,
                   n.burnin = 1000, model.file = "model.jags")

print(fit)
summary(lm(prestige ~ income + education, data=Prestige)) # to compare

fit.mcmc <- as.mcmc(fit)

summary(fit.mcmc)
plot(fit)
traceplot(fit)

xyplot(fit.mcmc)
xyplot(fit.mcmc, aspect="fill")
densityplot(fit.mcmc)
densityplot(fit.mcmc, aspect="fill")
autocorr.plot(fit.mcmc)

gelman.plot(fit.mcmc)
geweke.diag(fit.mcmc)
raftery.diag(fit.mcmc)
heidel.diag(fit.mcmc)

# one-stop shopping:
library(superdiag)
superdiag(fit.mcmc, burnin = 100)

library(mcmcplots)
denplot(fit.mcmc)
denplot(fit.mcmc, parms = c("beta0", "beta1", "beta2"))
traplot(fit.mcmc, parms = c("beta0", "beta1", "beta2"))
traplot(fit.mcmc[1], parms = c("beta0", "beta1", "beta2"))
traplot(fit.mcmc[2], parms = c("beta0", "beta1", "beta2"))
traplot(fit.mcmc[3], parms = c("beta0", "beta1", "beta2"))
mcmcplot(fit.mcmc) # whoa!!!

caterplot(fit.mcmc)
caterplot(fit.mcmc, parms = c("beta0", "beta1", "beta2"),
          labels = c("Intercept", "income", "education"))

library(ggmcmc) # pretty :)
fit.gg <- ggs(fit.mcmc)
ggs_density(fit.gg)
ggs_traceplot(fit.gg)
ggmcmc(fit.gg) # makes gorgeous compilation of plots saved as ggmcmc-output.pdf
# Produces density plots of the whole chain (black) with the density plot of only the latest part of the chain (green). By default, it uses the latest 10% of the chain as a comparison.

Bayesian SEM

Frequentist estimation of parameters in structural equation models requires large numbers of participants due to the large number parameters in even relatively simple SEMs. To cajole models toward convergence, modelers often constrain certain parameters to 0, or to equal other parameters – sometimes based on a priori theory, and sometimes based on criteria that could capitalize on chance.

Bayesian estimation of the parameters of complex interdependencies modeled in SEMs can yield valuable information even with small samples that might not converge by FIML, and posterior parameter densities can illuminate the structure of relations that may not be apparent from parameter and SE frequentist estimates.

I’ve collected below some literature both theoretical and practical regarding Bayesian Structural Equation Models.

  • An early (2005) chapter by David B. Dunson, Jesus Palomo, and Ken Bollen, Bayesian Structural Equation Modeling, gives a detailed explication of the math behind the matrix behind the SEM, pointing out all the parameters you might want to estimate. It’s dense, but there’s a fun example at the end.
  • For some theoretical background on BSEM that is geared toward practicality, check out the Mplus BSEM resource.
  • While you can use R and Jags to do your BSEM, if you’re already familiar with Mplus, just check out the user guide for a practical guide.

MLM in Bayes


# -------------------------------------------------
# get the L1 vars
# -------------------------------------------------
### receptive vocab (DV): PPVT
PPVT <- c("PPVTEPF", "PPVTEPS") ### prereading (DV): letter-word identification from WJIII WJ1_SSEPF: WJIII ENG PK F LETTER-WORD ID STAND SCORE preread <- c("WJ1_SSEPF", "WJ1_SSEPS") ### early math (DV): applied problems subtest of WJIII math <- c("WJ10_SSEPF", "WJ10_SSEPS") ### controls l1controls <- c("ICPSR_CLASS_ID", "CHGENP", "POORP") l1vars <- c(l1controls, PPVT, preread, math) # ------------------------------------------------- # get the L2 vars # ------------------------------------------------- ### teacher education TeachEd <- c("T_EDUCP", "TMAJORP") ### classroom quality: ECERTOTP: ECERS-R: PREK: TOTAL (ITEMS 1-10,12-36) PK ClassQual <- c("ECERTOTP") ### controls l2controls <- c("ICPSR_CLASS_ID", "STATE", "RATIOPSP", "CLS_SIZEPS", "HOURSWKP", "T_RACEP", "WHITEP", "PERPOORP", "LOCATIONP", "FULLDAYP") l2vars <- c(l2controls, TeachEd, ClassQual) # ------------------------------------------------- # prep the data file for jags # ------------------------------------------------- str(df) jags.dat <- df[order(df$ICPSR_CLASS_ID), ] # order by L2 units (classroom ID). jags.dat <- as.list(jags.dat[, l1vars[2:9]]) # selecting only the L1 vars to be used in the regressions jags.dat$CLASS <- as.numeric(as.ordered(df$ICPSR_CLASS_ID)) # save L2 unit counter. note that this must be numeric. str(jags.dat) #jags chokes on factors jags.dat$CHGENP <- as.numeric(jags.dat$CHGENP) - 1 jags.dat$POORP <- as.numeric(jags.dat$POORP) - 1 # some weirdness in the datafile... jags.dat$PPVTEPF <- as.numeric(jags.dat$PPVTEPF) jags.dat$WJ1_SSEPF <- as.numeric(jags.dat$WJ1_SSEPF) jags.dat$WJ10_SSEPF <- as.numeric(jags.dat$WJ10_SSEPF) summary(df[,l2vars]) # L2 vars should be the length of the number of L2 units! I have 704 classrooms. jags.dat$RATIOPSP <- as.vector(by(as.numeric(df$RATIOPSP), df$ICPSR_CLASS_ID, mean)) jags.dat$CLS_SIZEPS <- as.vector(by(as.numeric(df$CLS_SIZEPS), df$ICPSR_CLASS_ID, mean)) jags.dat$HOURSWKP <- as.vector(by(as.numeric(df$HOURSWKP), df$ICPSR_CLASS_ID, mean)) jags.dat$WHITEP <- as.vector(by(as.numeric(df$WHITEP), df$ICPSR_CLASS_ID, mean)) jags.dat$PERPOORP <- as.vector(by(as.numeric(df$PERPOORP), df$ICPSR_CLASS_ID, mean)) jags.dat$LOCATIONP <- as.vector(by(as.numeric(df$LOCATIONP), df$ICPSR_CLASS_ID, mean)) - 1 jags.dat$FULLDAYP <- as.vector(by(as.numeric(df$FULLDAYP), df$ICPSR_CLASS_ID, mean)) - 1 #jags.dat$T_EDUCP <- as.vector(by(as.numeric(df$T_EDUCP), df$ICPSR_CLASS_ID, mean)) # this needs to be dummy coded :( #jags.dat$TMAJORP <- as.vector(by(as.numeric(df$TMAJORP), df$ICPSR_CLASS_ID, mean)) # this needs to be dummy coded :( jags.dat$ECERTOTP <- as.vector(by(as.numeric(df$ECERTOTP), df$ICPSR_CLASS_ID, mean)) # need to add L1 n, and L2 n jags.dat$N.child <- nrow(df) jags.dat$N.class <- length(unique(df$ICPSR_CLASS_ID)) # check out the structure of the data file str(jags.dat) # ------------------------------------------------- # write the jags model # ------------------------------------------------- jags.model <- function(){ for (i in 1:N.child){ PPVTEPS[i] ~ dnorm (mu[i], tau) mu[i] <- b.class[CLASS[i]] + b.PPVTEPF * PPVTEPF[i] + b.CHGENP * CHGENP[i] + b.POORP * POORP[i] } b.PPVTEPF ~ dnorm(0, 0.01) b.CHGENP ~ dnorm(0, 0.01) b.POORP ~ dnorm (0, 0.01) for (j in 1:N.class){ b.class[j] ~ dnorm(b.class.hat[j], tau.class) b.class.hat[j] <- b.class0 + b.RATIOPSP * RATIOPSP[j] + b.CLS_SIZEPS * CLS_SIZEPS[j] + b.HOURSWKP * HOURSWKP[j] + b.WHITEP * WHITEP[j] + b.PERPOORP * PERPOORP[j] + b.LOCATIONP * LOCATIONP[j] + b.FULLDAYP * FULLDAYP[j] + b.ECERTOTP * ECERTOTP[j] } b.class.hat.mu <- mean(b.class.hat[]) b.class0 ~ dnorm(0,.01) b.RATIOPSP ~ dnorm(0,.01) b.CLS_SIZEPS ~ dnorm(0,.01) b.HOURSWKP ~ dnorm(0,.01) b.WHITEP ~ dnorm(0,.01) b.PERPOORP ~ dnorm(0,.01) b.LOCATIONP ~ dnorm(0,.01) b.FULLDAYP ~ dnorm(0,.01) b.ECERTOTP ~ dnorm(0,.01) tau ~ dgamma(1,1) sigma.class <- 1 / tau.class tau.class ~ dgamma(1,1) } # ------------------------------------------------- # set the parameters to track and run the model # ------------------------------------------------- params <- c(".PPVTEPF", "b.CHGENP", "b.POORP", "b.RATIOPSP", "b.CLS_SIZEPS", "b.HOURSWKP", "b.WHITEP", "b.PERPOORP", "b.LOCATIONP", "b.FULLDAYP", "b.ECERTOTP", "b.class") library(R2jags); set.seed(123) fit <- jags(data = jags.dat, inits = NULL, parameters.to.save = params, model.file = jags.model, n.chains = 2, n.iter = 2000, n.burnin = 1000) # Not working yet. :)

Bayes by hand

You may have come across the phrase “conjugate prior” in your reading. This is an important concept historically, and in my opinion, demonstrating the phenomenon is useful in a very practical sense in terms of solidifying your understanding of how Bayes’ Theorem works.

The historical piece is that back in the day, before computers or whatever, the only way to do Bayesian analysis was if you had conjugate priors. That is to say, you needed your prior distribution and your likelihood distribution to match up magically to produce a workable posterior. This webpage (http://www.johndcook.com/conjugate_prior_diagram.html) gives a list of several such magical pairs, and provides the PMF (probability mass function) for each. The reason these pairs of distributions are special (“conjugate”) is because when you multiply them together, they will make a new distribution that also just happens to be theoretically definable. It is technically possible to multiply any two distributions together, of course, but a lot of the time it will result in a weird, unique snowflake of a distribution that isn’t definable in terms of a nice, neat PMF. If the posterior distribution you get is a well-behaved function that we have a PMF for, it’s very easy to work with – you can take any value from that function and easily calculate its probability, for example. If your posterior distribution is a unique snowflake, you have no formula (PMF) for getting probabilities from it.

Nowadays, we don’t do it by hand at all – JAGS brute forces a posterior distribution for us, regardless of how ugly and/or unique it might be. So it’s no longer necessary to stick to conjugate priors to do Bayesian analysis. This is still a nice problem to work through, though, since it makes you think about 1) what the likelihood function is, and 2) how exactly information from your data get combined with a prior distribution to result in your posterior. To see the example we walked through in club today, check out my homework assignment from the super awesome summer class I took on Bayesian Modeling at ICPSR. This document is my answers to our first homework assignment:
gamma-Poisson demonstation

desmos graph showing the relationship between the prior and posterior distributions in this example (thank you, John!): https://www.desmos.com/calculator/hzt65i2top

Homework! If you want additional practice (and why wouldn’t you, right?), show that the beta and binomial are another conjugate pair. See the new post on doing Bayes by hand for the answers.

Bayes Club Agenda

Hi all,

For our first meeting this term (today, 3:00-4:20 in Franklin 271A), we’ll go over the following topics:

1. Quick-n-dirty intro to Bayes: A refresher for those of us who haven’t been thinking about this since Spring, and a baptism by fire for those of us who are brand new. 😉 We’ll touch on some basic vocab and concepts, including possibly using this demonstration: http://setosa.io/conditional/

2. Planning this term’s schedule: The goal is to get a list of mini presentations lined up (“nuggets”, for those of you familiar with this practice in R Club). We’d love everyone to sign up for at least one little informal presentation. You can plan to talk about an article, a question or a problem you’ve run into, or you can plan to share something you’ve learned or figured out that you think would be of interest to the group. It can be long and complicated, or very quick and basic. The idea is to give each of us an excuse to do a little prep/thinking/reading/futzing with Bayes stuff outside of Bayes club at least once over the course of the term. The best way to learn is to do, after all!

3. Today’s nugget: Bayes by hand! Yep, over the summer, I (Rose) learned how to actually do Bayesian analysis by hand, and with real distributions (not just toy examples with coin flips). It’s a mess, but it was super helpful for me in terms of building my understanding of how Bayes’ theorem actually works in a practical sense, so I figured I would try it as a demonstration for Bayes Club. Bring a pencil, and brace yourself.

Bayes Club starts next week!

The first meeting of Bayes Club Fall 2014 will be Tuesday 10/7, 3:00-4:20pm in Franklin 271A. We hope to see you there!

To whet your appetite, consider reading this recent NYT article: http://www.nytimes.com/2014/09/30/science/the-odds-continually-updated.html It’s about how Bayesian inference is all hot in stats right now, and everyone’s all like, “Ooo you can do Bayesy stuff? You’re soooo cool!” It also gives a nice, very general-audience description of the difference between frequentist and Bayesian stats (with a coin-flip example, of course. What is it with the coin flips?).

For another excellent piece of broad-audience reading, check out this article in the Guardian: http://www.theguardian.com/science/life-and-physics/2014/sep/28/belief-bias-and-bayes?CMP=twt_gu It gives my favorite (to date) common sense explanation of Bayes’ Theorem – and it does so without referencing a coin!

If you’re still trying to decide whether or not you want to officially register for Bayes Club (vs. just showing up casually), check out our recent post on registration.