pacman::p_load(tidyverse,
cowplot,
emmeans,
rstatix,
ez)40 Factorial ANOVA - Post hoc analysis of Main effects
In the last walkthrough, analyzed a 2×2 ANOVA focusing on our main effects. Because each of our factors only had two levels, there was no need to follow up on significant main effects with pairwise post hoc analysis. In this walkthough we will take a look at examples where our main effects have 3 or more levels, and demand a post-hoc analysis. For now, let’s try another example. First let’s bring in the necessary packages:
40.1 Example: A 2 × 3 ANOVA
Thirty-six college students were randomly assigned to 3 groups (N=12). Students in each group were asked to watch and take notes on a 50 min lecture. One week later all students were tested on the content of their lectures, and their scores were compared. Groups differed by the lecture’s subject matter, where:
- 1 = Physics lecture
- 2 = Social Science lecture
- 3 = History lecture
The lectures were presented in two manners
1 = via computer
2 = standard method, lecturer in a lecture hall
The researchers hypothesized that students that learned the material in the standard lecture would perform better than those that learned via computer.
dataset_no_inter <- read_csv("http://tehrandav.is/courses/statistics/practice_datasets/factorial_ANOVA_dataset_no_interactions.csv")Rows: 36 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Lecture, Presentation
dbl (2): Score, PartID
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dataset_no_inter$Lecture <- recode_factor(dataset_no_inter$Lecture, "1"="Phys","2"="Soc","3"="Hist")
dataset_no_inter$Presentation <- recode_factor(dataset_no_inter$Presentation, "1"="Comp","2"="Stand")
summary(dataset_no_inter) Lecture Presentation Score PartID
Phys:12 Comp :18 Min. :18.00 Min. : 1.00
Soc :12 Stand:18 1st Qu.:26.75 1st Qu.: 9.75
Hist:12 Median :35.50 Median :18.50
Mean :35.44 Mean :18.50
3rd Qu.:42.50 3rd Qu.:27.25
Max. :53.00 Max. :36.00
40.1.1 Building the ANOVA model
I’m going to use aov().
aov_model <- aov(Score~Presentation + Lecture, data = dataset_no_inter)40.1.2 Testing assumptions
40.1.3 visual
aov_model %>% performance::check_model(panel=FALSE)aov_model %>% performance::check_normality()OK: residuals appear as normally distributed (p = 0.887).
aov_model %>% performance::check_homogeneity()Warning: Variances differ between groups (Bartlett Test, p = 0.040).
The model checks both for normality of residuals but fails homogeneity of variance. Following upon given the robustness of ANOVA (3x rule). I’m just going to construct a summary table of the variances var() that includes the mean value of the group, its variance, and the variance ratio (dividing each variance by the minimum value - in this case 7.467). What I’m looking fo in this instance is that no observed ratios are greater than 3.
dataset_no_inter %>%
group_by(Lecture, Presentation) %>%
summarise(mean = mean(Score),
var = var(Score)
)`summarise()` has grouped output by 'Lecture'. You can override using the
`.groups` argument.
# A tibble: 6 × 4
# Groups: Lecture [3]
Lecture Presentation mean var
<fct> <fct> <dbl> <dbl>
1 Phys Comp 46 48.8
2 Phys Stand 34 121.
3 Soc Comp 40 23.2
4 Soc Stand 23.7 7.47
5 Hist Comp 38 19.2
6 Hist Stand 31 106.
Potential issues, but, for now, moving on.
40.1.4 Interaction plot
Let’s plot the data. You’ll note in this plot I’m adding some code to get it into better APA shape. More on this in the next walkthrough. For now just focus on content.
# setting original parameters
p <- ggplot(data = dataset_no_inter, mapping=aes(x=Lecture,y=Score,group=Presentation))
# making a basic line plot
line_p <- p + stat_summary(geom="pointrange",fun.data = "mean_se", size=0.75, position=position_dodge(.25), aes(shape=Presentation)) +
stat_summary(geom = "line", fun = "mean", position=position_dodge(.25), aes(linetype=Presentation))
# adding APA elements
line_p <- line_p + theme(
axis.title = element_text(size = 16, face = "bold", lineheight = .55),
axis.text = element_text(size = 12),
legend.title = element_text(size = 12, face = "bold"),
legend.position = c(.75,.9)) +
xlab("Lecture type") +
ylab ("Performance score") +
theme(plot.margin=unit(c(.25,.25,.25,.25),"in")) +
theme_cowplot()
show(line_p)
40.1.5 Testing the ANOVA
aov_model %>% rstatix::anova_test(effect.size = "pes")ANOVA Table (type II tests)
Effect DFn DFd F p p<.05 pes
1 Presentation 1 32 22.670 3.97e-05 * 0.415
2 Lecture 2 32 3.779 3.40e-02 * 0.191
40.2 post hoc analysis
Admittedly, moving on to this step will ultimately be qualified by the presence of interactions (next week). For now, note that if you don’t have an interaction, you may simply proceed to run post-hoc analyses on any significant main effects in the manner you would with a One-way ANOVA. Easy, peasy, right. One thing to note, you need to make the appropriate multiple comparison corrections. R
eturning to our data with no interaction, we need to test for differences in both the Lecture and Presentation main effects.
Presentation: This one is easy. We only have two levels of
Presentation, so the omnibus \(F\) test tells us that our two groups are different. Nothing else to do here other than note which mean (Computer v. Standard) is greater than the other.Lecture: We have three levels of lecture, so were are going to need to run a post-hoc analysis. In this case, we may call upon our old standbys, Tukey and Bonferroni.
Just as last week, we can use emmeans() to run our post-hoc tests.
For example, to run a Tukey, you need call your ANOVA model. For the sake of clarity let’s rebuild the ANOVA model and save it to aov_model and then run emmeans()
Here I’m saving it to the object aov_model:
aov_model <- aov(Score~Presentation + Lecture, data = dataset_no_inter)From here you may call upon the emmeans() function to derive your posthocs. By itself, emmeans produces the means by levels of the IV(s) listed in its spec= argument. It takes the aov() model as a first argument, and the IVs of interest as the second.
# input your model into the emmeans to create an emm_mdl
emm_post_hocs <- emmeans(aov_model,specs = ~Lecture)emmeans() alone gives us the estimated marginal means for each of our levels:
emm_post_hocs Lecture emmean SE df lower.CL upper.CL
Phys 40.0 2.14 32 35.6 44.4
Soc 31.8 2.14 32 27.5 36.2
Hist 34.5 2.14 32 30.1 38.9
Results are averaged over the levels of: Presentation
Confidence level used: 0.95
to run a post-hoc comparison we then pipe it into pairs() and include the p value adjustment that we would like to make. If you want to perform a Tukey test follow this procedure you can simply pipe the previous (or save to an object and submit) to pairs():
emm_post_hocs %>% pairs(adjust="tukey") contrast estimate SE df t.ratio p.value
Phys - Soc 8.17 3.03 32 2.696 0.0291
Phys - Hist 5.50 3.03 32 1.815 0.1807
Soc - Hist -2.67 3.03 32 -0.880 0.6565
Results are averaged over the levels of: Presentation
P value adjustment: tukey method for comparing a family of 3 estimates
In this case it appears that our Tukey does reveal differences between means, when performance of those getting Physical Science Lectures is greater Social. However, no other pairwise differences are present.
Note that you may call other p-value adjustments using these methods:
emm_post_hocs %>% pairs(adjust="bonferroni") contrast estimate SE df t.ratio p.value
Phys - Soc 8.17 3.03 32 2.696 0.0333
Phys - Hist 5.50 3.03 32 1.815 0.2365
Soc - Hist -2.67 3.03 32 -0.880 1.0000
Results are averaged over the levels of: Presentation
P value adjustment: bonferroni method for 3 tests
40.3 What about planned contrasts?
You need to be careful when running planned contrasts in factorial ANOVA. In general I would recommend only running planned contrasts on a single main effect, or a planned contrast on the effects of one of your factors at a single level of your other (though you still need to proceed with caution here).
For example, using the data from the last section, I would only run a planned contrast related to the main effect of Lecture Type, or a contrast of Lecture Type means only in Computer presentation conditions (or Standard presentation). DO NOT, I repeat DO NOT run contrasts that go across levels of your other factors. Well, truthfully, you can do whatever you want, but you may find that your ability to meaningfully interpret your results in such cases is extremely limited.
We can run planned contrasts using emmeans() as well. In this case, we need to specify the contrasts.
First we need to obtain the emmeans() of the model including all cells (all factors). Using aov.model from the previous example:
emm_planned <- emmeans(aov_model, specs = ~Lecture+Presentation)OK. From here let’s build two custom contrasts. First, let’s do the Lecture contrast on the main effect. In this case let’s assume I want to contrast Phys with the combined other two conditions. Let’s take a look at the output of the emm that we just created above:
emm_planned Lecture Presentation emmean SE df lower.CL upper.CL
Phys Comp 45.9 2.47 32 40.9 50.9
Soc Comp 37.7 2.47 32 32.7 42.8
Hist Comp 40.4 2.47 32 35.4 45.4
Phys Stand 34.1 2.47 32 29.1 39.1
Soc Stand 25.9 2.47 32 20.9 31.0
Hist Stand 28.6 2.47 32 23.6 33.6
Confidence level used: 0.95
Using the output above, I identify which rows in the output contain Phys. In this case they are the 1st and 4th rows, Next I ensure that the summation of those rows is 1, so each gets 0.5. My remaining conditions must also equal -1. In this case there are four, so each is -0.25. Following the output above, then my contrast matrix is (writing elongated to emphasize the relationship between the codes and the rows above)::
lecture_contrast <- list( "Phys v. Soc + Hist" = c(.5, # 1st row = Phys
-.25, # 2nd row = Soc
-.25, # 3rd row = Hist
.5, # etc
-.25,
-.25))From here I simply call contrast() with my contrast matrix as an argument. So the entire pipe goes from:
emm_planned %>% contrast(lecture_contrast) contrast estimate SE df t.ratio p.value
Phys v. Soc + Hist 6.83 2.62 32 2.604 0.0138
Assuming I wanted to perform a set of orthogonal contrasts:
Phys v. Soc and Hist and
Soc v Hist
# build the contrast matrix
contrast_matrix <- list("Phys v. Soc + Hist" = c(.5,-.25,-.25,.5,-.25,-.25),
"Soc v Hist" = c(0, -.5, .5, 0, -.5, .5)
)
# run the contrasts
emm_planned %>% contrast(contrast_matrix) contrast estimate SE df t.ratio p.value
Phys v. Soc + Hist 6.83 2.62 32 2.604 0.0138
Soc v Hist 2.67 3.03 32 0.880 0.3853
In both cases, my p-values are unadjusted. I can add an adjustment to the contrast() argument like so:
# tukey is most common
emm_planned %>% contrast(contrast_matrix, adjust = "tukey")Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
contrast estimate SE df t.ratio p.value
Phys v. Soc + Hist 6.83 2.62 32 2.604 0.0275
Soc v Hist 2.67 3.03 32 0.880 0.6222
P value adjustment: sidak method for 2 tests
# or bonferroni is most conservative
emm_planned %>% contrast(contrast_matrix, adjust = "bonferroni") contrast estimate SE df t.ratio p.value
Phys v. Soc + Hist 6.83 2.62 32 2.604 0.0277
Soc v Hist 2.67 3.03 32 0.880 0.7706
P value adjustment: bonferroni method for 2 tests
# or holm is more liberal
emm_planned %>% contrast(contrast_matrix, adjust = "holm") contrast estimate SE df t.ratio p.value
Phys v. Soc + Hist 6.83 2.62 32 2.604 0.0277
Soc v Hist 2.67 3.03 32 0.880 0.3853
P value adjustment: holm method for 2 tests
40.4 More examples: a 3 × 3 ANOVA
In the previous example we focused in the 2 × 3 scenario for ease. Let’s look at how we might deal with a 3 × 3 example. Let’s use our dataset from the previous walkthrough involving a 3 (Smoking group) by 3 (Task) design. Let’s run another example using this data. Let’s use this as a chance to brush up on creating APA visualizations:
dataset <- read_delim("https://www.uvm.edu/~statdhtx/methods8/DataFiles/Sec13-5.dat",
"\t", escape_double = FALSE,
trim_ws = TRUE)Rows: 135 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
dbl (4): Task, Smkgrp, score, covar
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dataset$PartID <- seq_along(dataset$score)
dataset$Task <- recode_factor(dataset$Task,
"1" = "Pattern Recognition",
"2" = "Cognitive",
"3" = "Driving Simulation")
dataset$Smkgrp <- recode_factor(dataset$Smkgrp,
"3" = "Nonsmoking",
"2" = "Delayed",
"1" = "Active")
dataset# A tibble: 135 × 5
Task Smkgrp score covar PartID
<fct> <fct> <dbl> <dbl> <int>
1 Pattern Recognition Active 9 107 1
2 Pattern Recognition Active 8 133 2
3 Pattern Recognition Active 12 123 3
4 Pattern Recognition Active 10 94 4
5 Pattern Recognition Active 7 83 5
6 Pattern Recognition Active 10 86 6
7 Pattern Recognition Active 9 112 7
8 Pattern Recognition Active 11 117 8
9 Pattern Recognition Active 8 130 9
10 Pattern Recognition Active 10 111 10
# ℹ 125 more rows
40.4.1 Interaction plot
# line plot
ggplot(data = dataset, mapping=aes(x=Smkgrp,y=score,group=Task, shape = Task)) +
stat_summary(geom="pointrange",
fun.data = "mean_cl_normal",
position=position_dodge(.5)) +
stat_summary(geom = "line",
fun = "mean",
position=position_dodge(.5),
aes(linetype=Task)) +
theme_cowplot()
Before continuing it might be useful to get a feel for whats going on in the dataset. In this case, both the performance on the Cognitive and Driving simulation tasks seems to be impacted by the degree of smoking. However the Pattern recognition task does not appear to be affected.
Another way of viewing this is that scores on the Cognitive task tended to be greater than the other two Task conditions. Let’s hold onto our impressions of the data and move on.
40.4.2 ANOVA model
As before we can build our ANOVA model and test it against the requisite assumptions:
aov_model <- aov(score~Smkgrp+Task,data = dataset)aov_model %>% performance::check_model(panel=F)aov_model %>% performance::check_normality()Warning: Non-normality of residuals detected (p < .001).
aov_model %>% performance::check_homogeneity()Warning: Variances differ between groups (Bartlett Test, p = 0.000).
As in the last walkthrough we’ll ignore the issues with our assumption checks
aov_model %>% rstatix::anova_test(effect.size = "pes")ANOVA Table (type II tests)
Effect DFn DFd F p p<.05 pes
1 Smkgrp 2 130 7.935 5.60e-04 * 0.109
2 Task 2 130 125.398 4.58e-31 * 0.659
Here we have significant main effects for Smkgrp and Task. We’ll need to run separate post-hoc analyses for each of our observed effects (given that both factors have 3 levels). Before moving on, I would recommend writing out the main points of this table to refer to later in your write up.
main effect for Task, \(F\) (2, 130) = 125.40, \(p\) < .001, \(n_p^2\) = .66.
main effect for Smoking Group, \(F\) (2, 130) = 7.94, \(p\) = .001, \(n_p^2\) = .11
40.5 Post-hoc analysis
40.5.1 Task
main_effect_Task <- emmeans(aov_model, ~Task)
main_effect_Task %>% pairs(adjust="tukey") contrast estimate SE df t.ratio p.value
Pattern Recognition - Cognitive -29.13 2.25 130 -12.927 <.0001
Pattern Recognition - Driving Simulation 3.29 2.25 130 1.459 0.3139
Cognitive - Driving Simulation 32.42 2.25 130 14.386 <.0001
Results are averaged over the levels of: Smkgrp
P value adjustment: tukey method for comparing a family of 3 estimates
The results confirm that overall, Cognitive task performance was greater than the other two conditions. To get the descriptive stats for these contrasts, we can use rstatix::get_summary, only specifying Task as our grouping variable:
dataset %>%
group_by(Task) %>%
get_summary_stats(score, type = "mean_se")# A tibble: 3 × 5
Task variable n mean se
<fct> <fct> <dbl> <dbl> <dbl>
1 Pattern Recognition score 45 9.64 0.673
2 Cognitive score 45 38.8 2.69
3 Driving Simulation score 45 6.36 0.85
40.5.2 Smkgrp
main_effect_Smkgrp <- emmeans(aov_model, ~Smkgrp)
main_effect_Smkgrp %>% pairs(adjust="tukey") contrast estimate SE df t.ratio p.value
Nonsmoking - Delayed 3.69 2.25 130 1.637 0.2339
Nonsmoking - Active 8.93 2.25 130 3.964 0.0004
Delayed - Active 5.24 2.25 130 2.327 0.0556
Results are averaged over the levels of: Task
P value adjustment: tukey method for comparing a family of 3 estimates
and to get these descriptive stats, we run do the same as above, only with Smkgrp as our grouping variable:
dataset %>%
group_by(Smkgrp) %>%
get_summary_stats(score, type = "mean_se")# A tibble: 3 × 5
Smkgrp variable n mean se
<fct> <fct> <dbl> <dbl> <dbl>
1 Nonsmoking score 45 22.5 3.04
2 Delayed score 45 18.8 2.89
3 Active score 45 13.5 2.11
40.6 Example write up
Let’s use this space to provide an example write-up for our factorial ANOVA. To do this I need to refer back to values I generated in my ANOVA table, my post-hoc tests, and my descriptive statistics above.
To test our hypothesis we ran a 3 (Task) × 3 (Smoking Group) ANOVA on cognitive performance scores. Our ANOVA revealed a significant main effect for Task, \(F\) (2, 130) = 125.40, \(p\) < .001, \(n_p^2\) = .66. Post hoc analysis of Task revealed that participants scored higher in the Cognitive group (\(M±SE\) = 38.78 ± 2.69) than the Pattern Recognition (9.64 ± 0.67 ) and Driving Simulator (6.36 ± 0.84) groups (\(ps\) < .05).
Our analysis also revealed a main effect for Smoking Group, \(F\) (2, 130) = 7.94, \(p\) = .001, \(n_p^2\) = .11. Post hoc analysis of Smoking Group revealed participants scored higher in the Nonsmoking group than the Active group (\(p\) < .05). No other statistically significant group differences were observed.