Plotting your logistic regression models

This code is all available on Rose’s github: https://github.com/rosemm/rexamples/blob/master/logistic_regression_plotting_part1.Rmd






Plotting the results of your logistic regression Part 1: Continuous by categorical interaction

We’ll run a nice, complicated logistic regresison and then make a plot that highlights a continuous by categorical interaction.

set.seed(24601) # setting this so the random results will be repeatable 

library(MASS)
covmat <- matrix(c(1.0,   0.2,   0.6, 
                   0.2,   1.0,  -0.5, 
                   0.6,  -0.5,   1.0), nrow=3) # the true cov matrix for my data
data <- mvrnorm(300, mu=c(0,0,0), Sigma=covmat) # generate random data that match that cov matrix
colnames(data) <- c("X1", "X2", "DV")
data <- as.data.frame(data)
data$group <- gl(n=3, k=ceiling(nrow(data)/3), labels=c("a", "b", "c", "d"))
# add some group differences and interaction stuff...
data$DV <- with(data, ifelse(group=="c" & X1 > 0, DV+rnorm(n=1, mean=1), 
                             ifelse(group=="b" & X1 > 0, DV+rnorm(n=1, mean=2) , DV)))
# make DV binary
data$DV <- ifelse(data$DV > 0, 1, 0)
head(data)
##            X1         X2 DV group
## 1  0.75191377  0.2978933  1     a
## 2 -1.45574399 -0.6868934  0     a
## 3 -0.07297941  0.2450438  0     a
## 4  0.91090328 -1.6934376  1     a
## 5  1.19905258  0.4765307  0     a
## 6  0.42466903 -0.9937526  1     a

Get the coefficients from your logistic regression model

First, whenever you’re using a categorical predictor in a model in R (or anywhere else, for that matter), make sure you know how it’s being coded!! For this example, we want it dummy coded (so we can easily plug in 0’s and 1’s to get equations for the different groups). This is called contr.treatment() in R.

contrasts(data$group)
##   b c d
## a 0 0 0
## b 1 0 0
## c 0 1 0
## d 0 0 1
# Great, that's what we wanted. And we can see that a is the reference group.
# If you want to change what contrasts will run, you can add an argument to the glm() call. For example contrasts="contr.treatment" will make it traditional dummy coding if it isn't already.

Now we can run that model.

# note this use of exponent in a formula will give us all 2-way interactions
model <- glm(DV ~ (X1 + X2 + group)^2, 
             data=data, na.action="na.exclude",  family="binomial") 
             
summary(model)
## 
## Call:
## glm(formula = DV ~ (X1 + X2 + group)^2, family = "binomial", 
##     data = data, na.action = "na.exclude")
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.44094  -0.45991   0.04136   0.52301   2.74705  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.5873     0.3438  -1.708 0.087631 .  
## X1            2.6508     0.5592   4.740 2.13e-06 ***
## X2           -2.2599     0.4977  -4.540 5.61e-06 ***
## groupb        2.2111     0.5949   3.717 0.000202 ***
## groupc        0.6650     0.4131   1.610 0.107456    
## X1:X2         0.1201     0.2660   0.452 0.651534    
## X1:groupb     2.7323     1.2977   2.105 0.035253 *  
## X1:groupc    -0.6816     0.7078  -0.963 0.335531    
## X2:groupb     0.8477     0.7320   1.158 0.246882    
## X2:groupc     0.4683     0.6558   0.714 0.475165    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 412.88  on 299  degrees of freedom
## Residual deviance: 205.46  on 290  degrees of freedom
## AIC: 225.46
## 
## Number of Fisher Scoring iterations: 7
model$coef
## (Intercept)          X1          X2      groupb      groupc       X1:X2 
##  -0.5872841   2.6508212  -2.2599250   2.2110951   0.6649971   0.1201166 
##   X1:groupb   X1:groupc   X2:groupb   X2:groupc 
##   2.7323113  -0.6816327   0.8476695   0.4682893
# save the coefficient values so we can use them in the equations
b0 <- model$coef[1] # intercept
X1 <- model$coef[2]
X2 <- -model$coef[3]
groupb <- model$coef[4]
groupc <- model$coef[5]
X1.X2 <- model$coef[6]
X1.groupb <- model$coef[7]
X1.groupc <- model$coef[8]
X2.groupb <- model$coef[9]
X2.groupc <- model$coef[10]

Note: If you were working in SPSS (or for some other reason you have run a model but can’t generate a plot for it), you can enter in your coefficients here, like this:

b0 <- -0.5872841 # intercept
X1 <- 2.6508212
X2 <- -2.2599250
groupb <- 2.2110951
groupc <- 0.6649971
X1.X2 <- 0.1201166
X1.groupb <- 2.7323113
X1.groupc <- -0.6816327
X2.groupb <- 0.8476695
X2.groupc <- 0.4682893

Calculate probabilities for the plot

First, decide what variable you want on your x-axis. That’s the only variable we’ll enter as a whole range. (The range we set here will determine the range on the x-axis of the final plot, by the way.)

X1_range <- seq(from=min(data$X1), to=max(data$X1), by=.01)

Next, compute the equations for each group in logit terms. These equations need to include every coefficient for the model you ran. You’ll need to plug in values for all but one variable – whichever variable you decided will be displayed on the x-axis of your plot. You make a separate equation for each group by plugging in different values for the group dummy codes.

X2_val <- mean(data$X2) # by plugging in the mean as the value for X2, I'll be generating plots that show the relationship between X1 and the outcome "for someone with an average X2".

a_logits <- b0 + 
  X1*X1_range + 
  X2*X2_val + 
  groupb*0 + 
  groupc*0 + 
  X1.X2*X1_range*X2_val + 
  X1.groupb*X1_range*0 + 
  X1.groupc*X1_range*0 + 
  X2.groupb*X2_val*0 + 
  X2.groupc*X2_val*0 # the reference group

b_logits <- b0 + 
  X1*X1_range + 
  X2*X2_val + 
  groupb*1 + 
  groupc*0 + 
  X1.X2*X1_range*X2_val + 
  X1.groupb*X1_range*1 + 
  X1.groupc*X1_range*0 + 
  X2.groupb*X2_val*1 + 
  X2.groupc*X2_val*0

c_logits <- b0 + 
  X1*X1_range + 
  X2*X2_val + 
  groupb*0 + 
  groupc*1 + 
  X1.X2*X1_range*X2_val + 
  X1.groupb*X1_range*0 + 
  X1.groupc*X1_range*1 + 
  X2.groupb*X2_val*0 + 
  X2.groupc*X2_val*1

# Compute the probibilities (this is what will actually get plotted):
a_probs <- exp(a_logits)/(1 + exp(a_logits))
b_probs <- exp(b_logits)/(1 + exp(b_logits))
c_probs <- exp(c_logits)/(1 + exp(c_logits))

Plot time!

# We'll start by plotting the ref group:
plot(X1_range, a_probs, 
     ylim=c(0,1),
     type="l", 
     lwd=3, 
     lty=2, 
     col="gold", 
     xlab="X1", ylab="P(outcome)", main="Probability of super important outcome")


# Add the line for people who are in the b group
lines(X1_range, b_probs, 
      type="l", 
      lwd=3, 
      lty=3, 
      col="turquoise2")

# Add the line for people who are in the c group
lines(X1_range, c_probs, 
      type="l", 
      lwd=3, 
      lty=4, 
      col="orangered")

# add a horizontal line at p=.5
abline(h=.5, lty=2)

Or, you can do it in ggplot2!

library(ggplot2); library(tidyr)
# first you have to get the information into a long dataframe, which is what ggplot likes :)
plot.data <- data.frame(a=a_probs, b=b_probs, c=c_probs, X1=X1_range)
plot.data <- gather(plot.data, key=group, value=prob, a:c)
head(plot.data)
##          X1 group         prob
## 1 -2.770265     a 0.0003264137
## 2 -2.760265     a 0.0003351590
## 3 -2.750265     a 0.0003441385
## 4 -2.740265     a 0.0003533586
## 5 -2.730265     a 0.0003628255
## 6 -2.720265     a 0.0003725460
ggplot(plot.data, aes(x=X1, y=prob, color=group)) + # asking it to set the color by the variable "group" is what makes it draw three different lines
  geom_line(lwd=2) + 
  labs(x="X1", y="P(outcome)", title="Probability of super important outcome") 



13 comments

  1. Martin

    Excellent post. I have been looking for such a clear explanation for hours. I have only one quick follow-up question: Could you (or anybody else) explain me how one can add a legend to the first plot?

    • rosem@uoregon.edu

      Thanks! To add a legend to a base R plot (the first plot is in base R), use the function legend. You have to enter all of the information for it (the names of the factor levels, the colors, etc.) manually. Here’s a nice tutorial . If you use the ggplot2 code instead, it builds the legend for you automatically.

  2. Carter

    Nice plots. It would be nice if you would add a nice real world interpretation of each line. This plot is useful but explain it in plain language to someone with less maths background and you have a winner in the business world. How do the groups compare to each other? What does a faster vertical slope mean? etc.

  3. Laura Picchi

    Hi, this is a really useful post! Anyway I would like to know if this script can be used even with mixed effects models (glmer formula).
    Thank you in advance for your answer

  4. Dan

    Hi, this is incredibly helpful and glad I stumbled upon it. Thank you for posting. I was wondering if you could provide some guidance for including the confidence interval. Again, thank you!

  5. Josh

    Awesome tutorial, inspired me to write a general function to do the same. If it helps anyone:

    #Helper function for primary probability function

    FUN.PROBS_SECONDARY <- function(X, Y, Z1, Z2, group){
    X[grep(group, names(X))] <- 1
    VAL <- as.numeric(X %*% Y) + Z1 * Z2
    VAL <- tibble(X = Z1, PROB = exp(VAL)/(1 + exp(VAL)), GROUP = group)
    return(VAL)
    }

    #Function to calculate probabilities from a logistic model which are grouped over 'GROUP_NAME' and vary over #'VAR'; all other factors will be assumed to take on reference level

    FUN.PROBS_PRIMARY <- function(VAR, VAR_RANGE, GROUP_NAME, MDL){
    X <- rep(0, length(MDL$coefficients))
    names(X) <- names(MDL$coefficients)
    X <- X[names(X) != VAR]
    X[['(Intercept)']] <- 1
    Y <- MDL$coefficients[names(X)]
    VAR_COEF <- MDL$coefficients[[VAR]]
    GROUP <- as.character(unique(MDL$data[[GROUP_NAME]]))
    names(GROUP) <- GROUP
    VAL % ggplot(aes(x=X, y=PROB, color = GROUP)) + geom_line()