::p_load(tidyverse,
pacman
cowplot,
emmeans,
rstatix, ez)
40 Factorial ANOVA - Post hoc analysis of Main effects
In the last walkthrough, analyzed a 2×2 ANOVA focusing on our main effects. Because each of our factors only had two levels, there was no need to follow up on significant main effects with pairwise post hoc analysis. In this walkthough we will take a look at examples where our main effects have 3 or more levels, and demand a post-hoc analysis. For now, let’s try another example. First let’s bring in the necessary packages:
40.1 Example: A 2 × 3 ANOVA
Thirty-six college students were randomly assigned to 3 groups (N=12). Students in each group were asked to watch and take notes on a 50 min lecture. One week later all students were tested on the content of their lectures, and their scores
were compared. Groups differed by the lecture’s subject matter, where:
- 1 = Physics lecture
- 2 = Social Science lecture
- 3 = History lecture
The lectures were presented in two manners
1 = via computer
2 = standard method, lecturer in a lecture hall
The researchers hypothesized that students that learned the material in the standard lecture would perform better than those that learned via computer.
<- read_csv("http://tehrandav.is/courses/statistics/practice_datasets/factorial_ANOVA_dataset_no_interactions.csv") dataset_no_inter
Rows: 36 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Lecture, Presentation
dbl (2): Score, PartID
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
$Lecture <- recode_factor(dataset_no_inter$Lecture, "1"="Phys","2"="Soc","3"="Hist")
dataset_no_inter
$Presentation <- recode_factor(dataset_no_inter$Presentation, "1"="Comp","2"="Stand")
dataset_no_inter
summary(dataset_no_inter)
Lecture Presentation Score PartID
Phys:12 Comp :18 Min. :18.00 Min. : 1.00
Soc :12 Stand:18 1st Qu.:26.75 1st Qu.: 9.75
Hist:12 Median :35.50 Median :18.50
Mean :35.44 Mean :18.50
3rd Qu.:42.50 3rd Qu.:27.25
Max. :53.00 Max. :36.00
40.1.1 Building the ANOVA model
I’m going to use aov()
.
<- aov(Score~Presentation + Lecture, data = dataset_no_inter) aov_model
40.1.2 Testing assumptions
40.1.3 visual
%>% performance::check_model(panel=FALSE) aov_model
%>% performance::check_normality() aov_model
OK: residuals appear as normally distributed (p = 0.887).
%>% performance::check_homogeneity() aov_model
Warning: Variances differ between groups (Bartlett Test, p = 0.040).
The model checks both for normality of residuals but fails homogeneity of variance. Following upon given the robustness of ANOVA (3x rule). I’m just going to construct a summary table of the variances var()
that includes the mean value of the group, its variance, and the variance ratio (dividing each variance by the minimum value - in this case 7.467). What I’m looking fo in this instance is that no observed ratios are greater than 3.
%>%
dataset_no_inter group_by(Lecture, Presentation) %>%
summarise(mean = mean(Score),
var = var(Score)
)
`summarise()` has grouped output by 'Lecture'. You can override using the
`.groups` argument.
# A tibble: 6 × 4
# Groups: Lecture [3]
Lecture Presentation mean var
<fct> <fct> <dbl> <dbl>
1 Phys Comp 46 48.8
2 Phys Stand 34 121.
3 Soc Comp 40 23.2
4 Soc Stand 23.7 7.47
5 Hist Comp 38 19.2
6 Hist Stand 31 106.
Potential issues, but, for now, moving on.
40.1.4 Interaction plot
Let’s plot the data. You’ll note in this plot I’m adding some code to get it into better APA shape. More on this in the next walkthrough. For now just focus on content.
# setting original parameters
<- ggplot(data = dataset_no_inter, mapping=aes(x=Lecture,y=Score,group=Presentation))
p
# making a basic line plot
<- p + stat_summary(geom="pointrange",fun.data = "mean_se", size=0.75, position=position_dodge(.25), aes(shape=Presentation)) +
line_p stat_summary(geom = "line", fun = "mean", position=position_dodge(.25), aes(linetype=Presentation))
# adding APA elements
<- line_p + theme(
line_p axis.title = element_text(size = 16, face = "bold", lineheight = .55),
axis.text = element_text(size = 12),
legend.title = element_text(size = 12, face = "bold"),
legend.position = c(.75,.9)) +
xlab("Lecture type") +
ylab ("Performance score") +
theme(plot.margin=unit(c(.25,.25,.25,.25),"in")) +
theme_cowplot()
show(line_p)
40.1.5 Testing the ANOVA
%>% rstatix::anova_test(effect.size = "pes") aov_model
ANOVA Table (type II tests)
Effect DFn DFd F p p<.05 pes
1 Presentation 1 32 22.670 3.97e-05 * 0.415
2 Lecture 2 32 3.779 3.40e-02 * 0.191
40.2 post hoc analysis
Admittedly, moving on to this step will ultimately be qualified by the presence of interactions (next week). For now, note that if you don’t have an interaction, you may simply proceed to run post-hoc analyses on any significant main effects in the manner you would with a One-way ANOVA. Easy, peasy, right. One thing to note, you need to make the appropriate multiple comparison corrections. R
eturning to our data with no interaction, we need to test for differences in both the Lecture
and Presentation
main effects.
Presentation: This one is easy. We only have two levels of
Presentation
, so the omnibus \(F\) test tells us that our two groups are different. Nothing else to do here other than note which mean (Computer v. Standard) is greater than the other.Lecture: We have three levels of lecture, so were are going to need to run a post-hoc analysis. In this case, we may call upon our old standbys, Tukey and Bonferroni.
Just as last week, we can use emmeans()
to run our post-hoc tests.
For example, to run a Tukey, you need call your ANOVA model. For the sake of clarity let’s rebuild the ANOVA model and save it to aov_model
and then run emmeans()
Here I’m saving it to the object aov_model
:
<- aov(Score~Presentation + Lecture, data = dataset_no_inter) aov_model
From here you may call upon the emmeans()
function to derive your posthocs. By itself, emmeans
produces the means by levels of the IV(s) listed in its spec=
argument. It takes the aov()
model as a first argument, and the IVs of interest as the second.
# input your model into the emmeans to create an emm_mdl
<- emmeans(aov_model,specs = ~Lecture) emm_post_hocs
emmeans()
alone gives us the estimated marginal means for each of our levels:
emm_post_hocs
Lecture emmean SE df lower.CL upper.CL
Phys 40.0 2.14 32 35.6 44.4
Soc 31.8 2.14 32 27.5 36.2
Hist 34.5 2.14 32 30.1 38.9
Results are averaged over the levels of: Presentation
Confidence level used: 0.95
to run a post-hoc comparison we then pipe it into pairs()
and include the p value
adjustment that we would like to make. If you want to perform a Tukey test follow this procedure you can simply pipe the previous (or save to an object and submit) to pairs()
:
%>% pairs(adjust="tukey") emm_post_hocs
contrast estimate SE df t.ratio p.value
Phys - Soc 8.17 3.03 32 2.696 0.0291
Phys - Hist 5.50 3.03 32 1.815 0.1807
Soc - Hist -2.67 3.03 32 -0.880 0.6565
Results are averaged over the levels of: Presentation
P value adjustment: tukey method for comparing a family of 3 estimates
In this case it appears that our Tukey does reveal differences between means, when performance of those getting Physical Science Lectures is greater Social. However, no other pairwise differences are present.
Note that you may call other p-value adjustments using these methods:
%>% pairs(adjust="bonferroni") emm_post_hocs
contrast estimate SE df t.ratio p.value
Phys - Soc 8.17 3.03 32 2.696 0.0333
Phys - Hist 5.50 3.03 32 1.815 0.2365
Soc - Hist -2.67 3.03 32 -0.880 1.0000
Results are averaged over the levels of: Presentation
P value adjustment: bonferroni method for 3 tests
40.3 What about planned contrasts?
You need to be careful when running planned contrasts in factorial ANOVA. In general I would recommend only running planned contrasts on a single main effect, or a planned contrast on the effects of one of your factors at a single level of your other (though you still need to proceed with caution here).
For example, using the data from the last section, I would only run a planned contrast related to the main effect of Lecture Type
, or a contrast of Lecture Type
means only in Computer presentation
conditions (or Standard presentation
). DO NOT, I repeat DO NOT run contrasts that go across levels of your other factors. Well, truthfully, you can do whatever you want, but you may find that your ability to meaningfully interpret your results in such cases is extremely limited.
We can run planned contrasts using emmeans()
as well. In this case, we need to specify the contrasts.
First we need to obtain the emmeans()
of the model including all cells (all factors). Using aov.model
from the previous example:
<- emmeans(aov_model, specs = ~Lecture+Presentation) emm_planned
OK. From here let’s build two custom contrasts. First, let’s do the Lecture
contrast on the main effect. In this case let’s assume I want to contrast Phys
with the combined other two conditions. Let’s take a look at the output of the emm
that we just created above:
emm_planned
Lecture Presentation emmean SE df lower.CL upper.CL
Phys Comp 45.9 2.47 32 40.9 50.9
Soc Comp 37.7 2.47 32 32.7 42.8
Hist Comp 40.4 2.47 32 35.4 45.4
Phys Stand 34.1 2.47 32 29.1 39.1
Soc Stand 25.9 2.47 32 20.9 31.0
Hist Stand 28.6 2.47 32 23.6 33.6
Confidence level used: 0.95
Using the output above, I identify which rows in the output contain Phys
. In this case they are the 1st and 4th rows, Next I ensure that the summation of those rows is 1, so each gets 0.5
. My remaining conditions must also equal -1
. In this case there are four, so each is -0.25
. Following the output above, then my contrast matrix is (writing elongated to emphasize the relationship between the codes and the rows above)::
<- list( "Phys v. Soc + Hist" = c(.5, # 1st row = Phys
lecture_contrast -.25, # 2nd row = Soc
-.25, # 3rd row = Hist
5, # etc
.-.25,
-.25))
From here I simply call contrast()
with my contrast matrix as an argument. So the entire pipe goes from:
%>% contrast(lecture_contrast) emm_planned
contrast estimate SE df t.ratio p.value
Phys v. Soc + Hist 6.83 2.62 32 2.604 0.0138
Assuming I wanted to perform a set of orthogonal contrasts:
Phys v. Soc and Hist and
Soc v Hist
# build the contrast matrix
<- list("Phys v. Soc + Hist" = c(.5,-.25,-.25,.5,-.25,-.25),
contrast_matrix "Soc v Hist" = c(0, -.5, .5, 0, -.5, .5)
)
# run the contrasts
%>% contrast(contrast_matrix) emm_planned
contrast estimate SE df t.ratio p.value
Phys v. Soc + Hist 6.83 2.62 32 2.604 0.0138
Soc v Hist 2.67 3.03 32 0.880 0.3853
In both cases, my p-values are unadjusted. I can add an adjustment to the contrast()
argument like so:
# tukey is most common
%>% contrast(contrast_matrix, adjust = "tukey") emm_planned
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
contrast estimate SE df t.ratio p.value
Phys v. Soc + Hist 6.83 2.62 32 2.604 0.0275
Soc v Hist 2.67 3.03 32 0.880 0.6222
P value adjustment: sidak method for 2 tests
# or bonferroni is most conservative
%>% contrast(contrast_matrix, adjust = "bonferroni") emm_planned
contrast estimate SE df t.ratio p.value
Phys v. Soc + Hist 6.83 2.62 32 2.604 0.0277
Soc v Hist 2.67 3.03 32 0.880 0.7706
P value adjustment: bonferroni method for 2 tests
# or holm is more liberal
%>% contrast(contrast_matrix, adjust = "holm") emm_planned
contrast estimate SE df t.ratio p.value
Phys v. Soc + Hist 6.83 2.62 32 2.604 0.0277
Soc v Hist 2.67 3.03 32 0.880 0.3853
P value adjustment: holm method for 2 tests
40.4 More examples: a 3 × 3 ANOVA
In the previous example we focused in the 2 × 3 scenario for ease. Let’s look at how we might deal with a 3 × 3 example. Let’s use our dataset from the previous walkthrough involving a 3 (Smoking group) by 3 (Task) design. Let’s run another example using this data. Let’s use this as a chance to brush up on creating APA visualizations:
<- read_delim("https://www.uvm.edu/~statdhtx/methods8/DataFiles/Sec13-5.dat",
dataset "\t", escape_double = FALSE,
trim_ws = TRUE)
Rows: 135 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
dbl (4): Task, Smkgrp, score, covar
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
$PartID <- seq_along(dataset$score)
dataset$Task <- recode_factor(dataset$Task,
dataset"1" = "Pattern Recognition",
"2" = "Cognitive",
"3" = "Driving Simulation")
$Smkgrp <- recode_factor(dataset$Smkgrp,
dataset"3" = "Nonsmoking",
"2" = "Delayed",
"1" = "Active")
dataset
# A tibble: 135 × 5
Task Smkgrp score covar PartID
<fct> <fct> <dbl> <dbl> <int>
1 Pattern Recognition Active 9 107 1
2 Pattern Recognition Active 8 133 2
3 Pattern Recognition Active 12 123 3
4 Pattern Recognition Active 10 94 4
5 Pattern Recognition Active 7 83 5
6 Pattern Recognition Active 10 86 6
7 Pattern Recognition Active 9 112 7
8 Pattern Recognition Active 11 117 8
9 Pattern Recognition Active 8 130 9
10 Pattern Recognition Active 10 111 10
# ℹ 125 more rows
40.4.1 Interaction plot
# line plot
ggplot(data = dataset, mapping=aes(x=Smkgrp,y=score,group=Task, shape = Task)) +
stat_summary(geom="pointrange",
fun.data = "mean_cl_normal",
position=position_dodge(.5)) +
stat_summary(geom = "line",
fun = "mean",
position=position_dodge(.5),
aes(linetype=Task)) +
theme_cowplot()
Before continuing it might be useful to get a feel for whats going on in the dataset. In this case, both the performance on the Cognitive and Driving simulation tasks seems to be impacted by the degree of smoking. However the Pattern recognition task does not appear to be affected.
Another way of viewing this is that scores on the Cognitive task tended to be greater than the other two Task conditions. Let’s hold onto our impressions of the data and move on.
40.4.2 ANOVA model
As before we can build our ANOVA model and test it against the requisite assumptions:
<- aov(score~Smkgrp+Task,data = dataset) aov_model
%>% performance::check_model(panel=F) aov_model
%>% performance::check_normality() aov_model
Warning: Non-normality of residuals detected (p < .001).
%>% performance::check_homogeneity() aov_model
Warning: Variances differ between groups (Bartlett Test, p = 0.000).
As in the last walkthrough we’ll ignore the issues with our assumption checks
%>% rstatix::anova_test(effect.size = "pes") aov_model
ANOVA Table (type II tests)
Effect DFn DFd F p p<.05 pes
1 Smkgrp 2 130 7.935 5.60e-04 * 0.109
2 Task 2 130 125.398 4.58e-31 * 0.659
Here we have significant main effects for Smkgrp
and Task
. We’ll need to run separate post-hoc analyses for each of our observed effects (given that both factors have 3 levels). Before moving on, I would recommend writing out the main points of this table to refer to later in your write up.
main effect for Task, \(F\) (2, 130) = 125.40, \(p\) < .001, \(n_p^2\) = .66.
main effect for Smoking Group, \(F\) (2, 130) = 7.94, \(p\) = .001, \(n_p^2\) = .11
40.5 Post-hoc analysis
40.5.1 Task
<- emmeans(aov_model, ~Task)
main_effect_Task %>% pairs(adjust="tukey") main_effect_Task
contrast estimate SE df t.ratio p.value
Pattern Recognition - Cognitive -29.13 2.25 130 -12.927 <.0001
Pattern Recognition - Driving Simulation 3.29 2.25 130 1.459 0.3139
Cognitive - Driving Simulation 32.42 2.25 130 14.386 <.0001
Results are averaged over the levels of: Smkgrp
P value adjustment: tukey method for comparing a family of 3 estimates
The results confirm that overall, Cognitive task performance was greater than the other two conditions. To get the descriptive stats for these contrasts, we can use rstatix::get_summary
, only specifying Task
as our grouping variable:
%>%
dataset group_by(Task) %>%
get_summary_stats(score, type = "mean_se")
# A tibble: 3 × 5
Task variable n mean se
<fct> <fct> <dbl> <dbl> <dbl>
1 Pattern Recognition score 45 9.64 0.673
2 Cognitive score 45 38.8 2.69
3 Driving Simulation score 45 6.36 0.85
40.5.2 Smkgrp
<- emmeans(aov_model, ~Smkgrp)
main_effect_Smkgrp
%>% pairs(adjust="tukey") main_effect_Smkgrp
contrast estimate SE df t.ratio p.value
Nonsmoking - Delayed 3.69 2.25 130 1.637 0.2339
Nonsmoking - Active 8.93 2.25 130 3.964 0.0004
Delayed - Active 5.24 2.25 130 2.327 0.0556
Results are averaged over the levels of: Task
P value adjustment: tukey method for comparing a family of 3 estimates
and to get these descriptive stats, we run do the same as above, only with Smkgrp
as our grouping variable:
%>%
dataset group_by(Smkgrp) %>%
get_summary_stats(score, type = "mean_se")
# A tibble: 3 × 5
Smkgrp variable n mean se
<fct> <fct> <dbl> <dbl> <dbl>
1 Nonsmoking score 45 22.5 3.04
2 Delayed score 45 18.8 2.89
3 Active score 45 13.5 2.11
40.6 Example write up
Let’s use this space to provide an example write-up for our factorial ANOVA. To do this I need to refer back to values I generated in my ANOVA table, my post-hoc tests, and my descriptive statistics above.
To test our hypothesis we ran a 3 (Task) × 3 (Smoking Group) ANOVA on cognitive performance scores. Our ANOVA revealed a significant main effect for Task, \(F\) (2, 130) = 125.40, \(p\) < .001, \(n_p^2\) = .66. Post hoc analysis of Task revealed that participants scored higher in the Cognitive group (\(M±SE\) = 38.78 ± 2.69) than the Pattern Recognition (9.64 ± 0.67 ) and Driving Simulator (6.36 ± 0.84) groups (\(ps\) < .05).
Our analysis also revealed a main effect for Smoking Group, \(F\) (2, 130) = 7.94, \(p\) = .001, \(n_p^2\) = .11. Post hoc analysis of Smoking Group revealed participants scored higher in the Nonsmoking group than the Active group (\(p\) < .05). No other statistically significant group differences were observed.