Category: Uncategorized

Bayes Club & | R Club

Play choose your own adventure with our very own data science club line-up! You come to a fork in the road. Do you take it? If so, turn to page round(runif(1,1,100*pi)).

Seriously, though: we have pretty much no days left to make this decision, so please respond to the below as soon as you can. It’s only a few questions long. And thank you so much for taking the time to do so — it’ll be really helpful!

A Quick Bayes’ Theorem Reference Tool in Python

UPDATE 2015-02-16: I’ve added a conceptual explanation of this code here.

Here’s a quick script that you can use (e.g., with a bash “alias” file, which sets shortcuts for the Unix-style command line) for making decisions with Bayes. It’s based off of Nate Silver’s version of Bayes’ Theorem. It can be run with python /path/to/this/script.py.


#!/bin/python
# The above line just tells the interpreter that this is a python script.

# Jacob Levernier, 2013
# This code is released CC0 (https://creativecommons.org/publicdomain/zero/1.0/) (Public Domain)

# THIS APPLIES BAYES' THEOREM (AS PRINTED ON P. 247 FF. OF NATE SILVER'S BOOK /THE SIGNAL AND THE NOISE/):

print "BAYES THEOREM:"

# (See https://en.wikibooks.org/wiki/Python_Programming/Input_and_output and http://docs.python.org/2/library/functions.html#input)
x=float(input('Make an estimation: How likely (out of 1.00) would you have estimated it would be for your hypothesis to be correct, before seeing this new information (The "prior")? '))

y=float(input('How likely (out of 1.00) do you think it is to see these results if your hypothesized explanation is correct? '))

z=float(input('How likely (out of 1.00) do you think it is to see these results if your hypothesized explanation is NOT correct? '))

posteriorProbability=(x*y)/((x*y)+(z*(1-x)))

print "The revised estimate of how likely (out of 1.00) it is to see these results if your hypothesis is correct, given current circumstances, is",posteriorProbability

End of term reminder

Just a friendly reminder, since we’re approaching the end of the term. If you’re taking Bayes Club for credit, you need to:

1) attend Bayes Club (we’ve been keeping track of attendance, so you’re covered there)

2) send a one-page-ish write-up to Sanjay (sanjay@uoregon.edu) by the end of the term, describing what you’ve done and/or what you’ve learned in Bayes Club this term.

If you are signed up for credit and you think you’ll have trouble meeting one or both of these requirements, email Sanjay.

From Child Development: A gentle intro to Bayesian stats

This article could be the ground floor for your Bayesian skyscraper: it’s clearly written, conceptually precise, and beautifully illustrated with empirical examples and practical guides (in the supplementary materials).

A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research

Rens van de Schoot, David Kaplan, Jaap Denissen, Jens B. Asendorpf, Franz J. Neyer, Marcel A.G. van Aken

Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlying Bayesian methods are introduced using a simplified example. Thereafter, the advantages and pitfalls of the specification of prior knowledge are discussed. To illustrate Bayesian methods explained in this study, in a second example a series of studies that examine the theoretical framework of dynamic interactionism are considered. In the Discussion the advantages and disadvantages of using Bayesian statistics are reviewed, and guidelines on how to report on Bayesian statistics are provided.

“…it is clear that it is not possible to think about learning from experience and acting on it without coming to terms with Bayes’ theorem. Jerome Cornfield” (in De Finetti, 1974a)

Bayesian SEM

Frequentist estimation of parameters in structural equation models requires large numbers of participants due to the large number parameters in even relatively simple SEMs. To cajole models toward convergence, modelers often constrain certain parameters to 0, or to equal other parameters – sometimes based on a priori theory, and sometimes based on criteria that could capitalize on chance.

Bayesian estimation of the parameters of complex interdependencies modeled in SEMs can yield valuable information even with small samples that might not converge by FIML, and posterior parameter densities can illuminate the structure of relations that may not be apparent from parameter and SE frequentist estimates.

I’ve collected below some literature both theoretical and practical regarding Bayesian Structural Equation Models.

  • An early (2005) chapter by David B. Dunson, Jesus Palomo, and Ken Bollen, Bayesian Structural Equation Modeling, gives a detailed explication of the math behind the matrix behind the SEM, pointing out all the parameters you might want to estimate. It’s dense, but there’s a fun example at the end.
  • For some theoretical background on BSEM that is geared toward practicality, check out the Mplus BSEM resource.
  • While you can use R and Jags to do your BSEM, if you’re already familiar with Mplus, just check out the user guide for a practical guide.

MLM in Bayes


# -------------------------------------------------
# get the L1 vars
# -------------------------------------------------
### receptive vocab (DV): PPVT
PPVT <- c("PPVTEPF", "PPVTEPS") ### prereading (DV): letter-word identification from WJIII WJ1_SSEPF: WJIII ENG PK F LETTER-WORD ID STAND SCORE preread <- c("WJ1_SSEPF", "WJ1_SSEPS") ### early math (DV): applied problems subtest of WJIII math <- c("WJ10_SSEPF", "WJ10_SSEPS") ### controls l1controls <- c("ICPSR_CLASS_ID", "CHGENP", "POORP") l1vars <- c(l1controls, PPVT, preread, math) # ------------------------------------------------- # get the L2 vars # ------------------------------------------------- ### teacher education TeachEd <- c("T_EDUCP", "TMAJORP") ### classroom quality: ECERTOTP: ECERS-R: PREK: TOTAL (ITEMS 1-10,12-36) PK ClassQual <- c("ECERTOTP") ### controls l2controls <- c("ICPSR_CLASS_ID", "STATE", "RATIOPSP", "CLS_SIZEPS", "HOURSWKP", "T_RACEP", "WHITEP", "PERPOORP", "LOCATIONP", "FULLDAYP") l2vars <- c(l2controls, TeachEd, ClassQual) # ------------------------------------------------- # prep the data file for jags # ------------------------------------------------- str(df) jags.dat <- df[order(df$ICPSR_CLASS_ID), ] # order by L2 units (classroom ID). jags.dat <- as.list(jags.dat[, l1vars[2:9]]) # selecting only the L1 vars to be used in the regressions jags.dat$CLASS <- as.numeric(as.ordered(df$ICPSR_CLASS_ID)) # save L2 unit counter. note that this must be numeric. str(jags.dat) #jags chokes on factors jags.dat$CHGENP <- as.numeric(jags.dat$CHGENP) - 1 jags.dat$POORP <- as.numeric(jags.dat$POORP) - 1 # some weirdness in the datafile... jags.dat$PPVTEPF <- as.numeric(jags.dat$PPVTEPF) jags.dat$WJ1_SSEPF <- as.numeric(jags.dat$WJ1_SSEPF) jags.dat$WJ10_SSEPF <- as.numeric(jags.dat$WJ10_SSEPF) summary(df[,l2vars]) # L2 vars should be the length of the number of L2 units! I have 704 classrooms. jags.dat$RATIOPSP <- as.vector(by(as.numeric(df$RATIOPSP), df$ICPSR_CLASS_ID, mean)) jags.dat$CLS_SIZEPS <- as.vector(by(as.numeric(df$CLS_SIZEPS), df$ICPSR_CLASS_ID, mean)) jags.dat$HOURSWKP <- as.vector(by(as.numeric(df$HOURSWKP), df$ICPSR_CLASS_ID, mean)) jags.dat$WHITEP <- as.vector(by(as.numeric(df$WHITEP), df$ICPSR_CLASS_ID, mean)) jags.dat$PERPOORP <- as.vector(by(as.numeric(df$PERPOORP), df$ICPSR_CLASS_ID, mean)) jags.dat$LOCATIONP <- as.vector(by(as.numeric(df$LOCATIONP), df$ICPSR_CLASS_ID, mean)) - 1 jags.dat$FULLDAYP <- as.vector(by(as.numeric(df$FULLDAYP), df$ICPSR_CLASS_ID, mean)) - 1 #jags.dat$T_EDUCP <- as.vector(by(as.numeric(df$T_EDUCP), df$ICPSR_CLASS_ID, mean)) # this needs to be dummy coded :( #jags.dat$TMAJORP <- as.vector(by(as.numeric(df$TMAJORP), df$ICPSR_CLASS_ID, mean)) # this needs to be dummy coded :( jags.dat$ECERTOTP <- as.vector(by(as.numeric(df$ECERTOTP), df$ICPSR_CLASS_ID, mean)) # need to add L1 n, and L2 n jags.dat$N.child <- nrow(df) jags.dat$N.class <- length(unique(df$ICPSR_CLASS_ID)) # check out the structure of the data file str(jags.dat) # ------------------------------------------------- # write the jags model # ------------------------------------------------- jags.model <- function(){ for (i in 1:N.child){ PPVTEPS[i] ~ dnorm (mu[i], tau) mu[i] <- b.class[CLASS[i]] + b.PPVTEPF * PPVTEPF[i] + b.CHGENP * CHGENP[i] + b.POORP * POORP[i] } b.PPVTEPF ~ dnorm(0, 0.01) b.CHGENP ~ dnorm(0, 0.01) b.POORP ~ dnorm (0, 0.01) for (j in 1:N.class){ b.class[j] ~ dnorm(b.class.hat[j], tau.class) b.class.hat[j] <- b.class0 + b.RATIOPSP * RATIOPSP[j] + b.CLS_SIZEPS * CLS_SIZEPS[j] + b.HOURSWKP * HOURSWKP[j] + b.WHITEP * WHITEP[j] + b.PERPOORP * PERPOORP[j] + b.LOCATIONP * LOCATIONP[j] + b.FULLDAYP * FULLDAYP[j] + b.ECERTOTP * ECERTOTP[j] } b.class.hat.mu <- mean(b.class.hat[]) b.class0 ~ dnorm(0,.01) b.RATIOPSP ~ dnorm(0,.01) b.CLS_SIZEPS ~ dnorm(0,.01) b.HOURSWKP ~ dnorm(0,.01) b.WHITEP ~ dnorm(0,.01) b.PERPOORP ~ dnorm(0,.01) b.LOCATIONP ~ dnorm(0,.01) b.FULLDAYP ~ dnorm(0,.01) b.ECERTOTP ~ dnorm(0,.01) tau ~ dgamma(1,1) sigma.class <- 1 / tau.class tau.class ~ dgamma(1,1) } # ------------------------------------------------- # set the parameters to track and run the model # ------------------------------------------------- params <- c(".PPVTEPF", "b.CHGENP", "b.POORP", "b.RATIOPSP", "b.CLS_SIZEPS", "b.HOURSWKP", "b.WHITEP", "b.PERPOORP", "b.LOCATIONP", "b.FULLDAYP", "b.ECERTOTP", "b.class") library(R2jags); set.seed(123) fit <- jags(data = jags.dat, inits = NULL, parameters.to.save = params, model.file = jags.model, n.chains = 2, n.iter = 2000, n.burnin = 1000) # Not working yet. :)

End of term reminder

Just a friendly reminder, since we’re approaching the end of spring term. If you’re taking Bayes Club for credit, you need to:

1) attend Bayes Club (we’ve been keeping track of attendance, so you’re covered there)

2) send a one-page-ish write-up to Sanjay (sanjay@uoregon.edu) by the end of the term, describing what you’ve done and/or what you’ve learned in Bayes Club this term.

If you are signed up for credit and you think you’ll have trouble meeting one or both of these requirements, email Sanjay.

Sampler Exampler

x <- rnorm(10) #what's the mean of x? # i'm a sampler!! # try beta = 1 beta = 1 # get the probability of each observation given my model (i.e. "the data are normally distributed") and a parameter estimate ("beta = 1") p <- dnorm(x, mean = beta) # assuming the observations are independent, the probability of all of the data for a particular beta value is the probability of each observation multipled together PofD_B1 = prod(p) # try beta = 0 beta <- 0 p <- dnorm(x, mean = beta) PofD_B0 = prod(p) # beta = 0 looks better than beta = 1. I'll move to beta = 0 and keep looking around there. # try lots of betas! # this is a dumb sampler since it's just chugging through betas and not chosing them based on probability, but it demonstrates how this would reuslt in a likelihood distribution. betas <- seq(-2,2,.1) PofD_B <- 1:length(betas) for (i in 1:length(betas)) { beta <- betas[i] p <- dnorm(x, mean = beta) PofD_B[i] = prod(p) } results <- cbind(betas,PofD_B) plot(PofD_B~betas, type="l")