How should I interpret significant results for…

…an F test? Are you comparing just 2 means? If so, interpret it just like a t-test: You must interpret the direction of the effect (glance at the group means to see which is higher if you don’t already know). If you have more than 2 means, then the F test tells you there is one or more significant differences somewhere among the means, but does NOT tell you where the significant difference(s) are. If this is a one-way ANOVA, you’ll want to run contrasts to find out. If this is a factorial ANOVA, see information below about main effect, interactions, simple effects, and contrasts.

…a main effect (in factorial ANOVA)? This is just another test, but it compares row means or column means (which average across individual groups) rather than comparing group means (i.e. cell means) individually. Are you comparing only 2 row means, or column means? If so, interpret the direction of the effect. If you are comparing more than 2 row/column means, then the significant main effect tells you there is one or more sig differences somewhere among the row or column means you tested, but does NOT tell you where the significant difference(s) are.

  • Note that since main effects only test combined means (row or column averages), you can’t always say for sure what’s happening with the cell means based on main effects. For example, if you have a 2×3 ANOVA examining gender (M or F) and treatment (A, B or C), if you see a significant main effect of gender such that men score higher than women, that only means that men’s scores averaged across all three treatments are higher than women’s scores averaged across all three treatments. It’s possible that one of the treatments worked quite differently than the other two (maybe women score higher than men in treatment A) – that’s an interaction. Saying simply “there was a significant main effect of gender such that men scored higher than women” is misleading because it’s not true in all cases. If you have a significant main effect and also a significant interaction, you need to interpret the interaction to decide whether or not the main effect is still meaningful.

…an interaction (in factorial ANOVA)? This is also just another F test, testing whether there are any significant differences among the cell means after factoring out row and column effects. Since factorial ANOVAs will always have more than 2 cell means (the simplest factorial ANOVA is 2×2, so you’ll always have at least 4 cell means), you can never interpret the direction of the effect for an interaction without running more tests. The “more tests” you run are typically simple effects tests, and contrasts (if appropriate).

…a simple effects test (in factorial ANOVA)? This is also just an F test. It’s actually a one-way ANOVA, comparing all the cell means in a particular row or column. So, if there are only 2 cell means in that row (or column), then you interpret the direction of your effect and there’s no more work to do (i.e. you won’t run contrasts on those two means – you already know whether they’re different, and if so which is higher. There’s nothing else to learn). If there are more than 2 cell means in that row/column, then the significant simple effects test tells you there is one or more significant differences somewhere among the means, but does NOT tell you where the significant difference(s) are. You’ll want to run contrasts to find out.

… contrasts? A contrast always compares only 2 means, so you always can (and must) interpret the direction of a significant contrast. Sometimes one or more of the means is a mean averaged across multiple groups, which is fine – for example, in a set of Helmert contrasts on treatments A, B, and C, one contrast would compare the average of A and B to C, and the second contrast would compare A and B. If both contrasts were significant, you would interpret the direction for the first by looking at the mean of groups A and B pooled together and the mean of C to see which is higher. You interpret the direction of the second contrast just by looking at the mean of A and the mean of B.

  • Note that in factorial ANOVAs, there are two common situations where you would find yourself wanting to run a contrast: To understand a significant main effect on more than 2 row or column means, or to understand a significant simple effects test on more than 2 cell means.

EXAMPLE TIME! You do a 2×3 ANOVA testing the effect of gender (M or F) and treatment (A, B, or C). Let’s say treatments A, B, and C refer to dosage levels for a new drug (A = low, B = medium, C = high dose).

1.    The factorial ANOVA is significant. You know the cell means are not all the same, but you don’t know how they differ.

2.    You have a significant main effect of gender. Since there are only two levels of gender (M or F), you can interpret the direction of the effect. You examine the mean for men averaged across all three treatments and see that it is higher than the mean for women averaged across all three treatments. You know all you can learn about the mean for men vs. women averaged across all three treatments: they’re significantly different, and men score higher.

3.    You have a significant main effect of treatment. You know the means for each treatment (A, B, and C) averaged across both genders are not all the same, but you don’t know how they differ

4.    You have a significant interaction between gender and treatment. You know that the cell means for each gender-treatment combination (after accounting for row and column effects) are not all the same, but you don’t know how they differ. Importantly, the fact that you know there are still significant differences between some cell means beyond that explained by your main effects of gender and treatment (the row and column effects) makes you hesitant to assume your main effects apply across the board. You need to find out how this interaction works to see whether your main effects still make sense or not.

  • Note that at this point there are still several significant results we can’t fully interpret: the significant main effect of treatment (e.g. we still don’t know which treatment(s) worked the best!), and the interaction between gender and treatment (we know that the effect of treatment depends on gender, but we don’t know how it works). This lingering ambiguity motivates the next tests we run.

5.    First, let’s tackle that main effect of treatment. You want to know how the three column means for treatment (A averaged across both genders, B averaged across both genders, and C averaged across both genders) differ. Since the levels of treatment are meaningfully ordered (low, med, high), polynomial trend contrasts make sense. You get a significant linear contrast, and a non-significant quadratic contrast. You know that low (averaging across both genders) is different from high (averaging across both genders), and since this isn’t qualified by a quadratic trend you know that medium (averaging across both genders) is not significantly different from the average of low and high, suggesting that scores (averaging across both genders) increase as dosage increases, and that scores go up about the same amount for each increase in dose.

Now let’s work on the interaction…

 6.    You have a significant simple effect of treatment at men. This is a one-way ANOVA comparing the means for men who got low dose (A), men who got medium dose (B), and men who got high dose (C). You know that there is one or more significant differences somewhere among the means, but does NOT tell you where the significant difference(s) are.

7.    You have a non-significant simple effect of treatment at women. This is a one-way ANOVA comparing the means for women who got low dose (A), women who got medium dose (B), and women who got high dose (C). You know that there are no significant differences among these means. That suggests that dosage level doesn’t affect women’s scores on this task (i.e. no matter what dosage they got, all of the groups scored about the same). Note that I’m accepting the null hypothesis here, which is sloppy – it’s also quite possible that there are real differences for women who get different dosage levels, but we don’t have a big enough sample here to detect the effect.

8.    You have a significant simple effect of gender at treatment A. Since there are only two levels of gender, there are only two cell means in the low-dose group: men and women. You examine the mean for men in treatment A and see that it is higher than the mean for women in treatment A. You know all you can learn about the mean for men vs. women in treatment A: they’re significantly different, and men score higher.

9.    You have a significant simple effect of gender at treatment B. You examine the mean for men in treatment B and see that it is higher than the mean for women in treatment B.

10. You have a significant simple effect of gender at treatment C. You examine the mean for men in treatment C and see that it is higher than the mean for women in treatment C.

Taking these three simple effect tests together (the simple effects of gender at each level of treatment), we can see that no matter which dosage group you examine, women score significantly lower than men. So now we know that our main effect of gender holds across all of the treatments.

  • Note that at this point we can interpret our main effects, but we’re not done with the interaction. We understand the direction of both of our main effects and we know that the main effect of gender is true across all the treatments. The interaction is messing with our main effect of treatment, though: we know treatment works differently in men and women because there is a significant effect of treatment within the men, but there isn’t a significant effect of treatment in women. That means the contrast we ran showing there’s a significant linear trend in treatment collapsing across genders (part 5) is no good. We’re not done, though, because we still can’t fully interpret the interaction – we still don’t know how treatment works in the men, just that there are differences between the treatments A, B and C in men. Use contrasts to find out.

11. You run polynomial trend contrasts on treatment within men. You get a significant linear contrast, and a non-significant quadratic contrast. You know that, for men, low dose is different from high dose, and since this isn’t qualified by a quadratic trend you know that medium dose is not significantly different from the average of low and high, suggesting that scores for men increase as dosage increases, and that scores go up about the same amount for each increase in dose.

  • Okay. Now we’ve ironed out all the details. Here’s what we know: There is a significant effect of gender such that men score higher than women, and a significant effect of treatment which is qualified by an interaction between gender and treatment. There is not a simple effect of treatment within women, but there is within men such that scores increase linearly as dosage increases. Basically, it appears that men react to dosage such that the higher dose they get, their higher scores, whereas women always score lower than men and appear not to be affected by dosage. Ta-da!

What’s the difference between all the different kinds of orthogonal contrasts (Helmert, polynomial trend, etc.)?

There is no difference, really! For J group means, you can create J-1 orthogonal contrasts, but the particular contrasts that would be theoretically motivated will differ study to study. For some reason (vanity?) people started naming a couple common sets of orthogonal contrasts. There’s nothing fundamentally different about conducting Helmert contrasts vs. polynomial trend contrasts vs. some other set of orthogonal contrasts you invent. They’re all just contrast weights applied to group means. When you have a set of means you want to conduct contrasts on, just think about which comparisons would make sense theoretically and figure out a way you can elegantly get that information. When possible, you should construct orthogonal contrasts (but don’t stop yourself from testing an important question if it would mean non-orthogonal contrasts – orthogonality is good, but not vital). Maybe you’ll end up with a set of contrasts that has been named by somebody, maybe you won’t. It completely doesn’t matter. Just pay attention to your contrast weights and you’ll be able to interpret your results just fine.

For an excellent description of lots of different coding schemes – and all of the relevant code for using them in R! – see the contrast coding explanation from IDRE. Although beware: the contrasts() command is not as straightforward as the lovely folks at IDRE make it out to be. If you’re interested in trying this in R, be sure to also read my rpubs page on contrasts.

Also see this post by Nicole.