Let’s ramp-up the complexity. Here is a dataset testing Recall over three days as a function of depth of processing (Lo, Med, High). In this case we have two within factors.
Rows: 90 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): ProcessDepth, Day
dbl (2): Subject, Recalled
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
factorial_df
# A tibble: 90 × 4
Subject Recalled ProcessDepth Day
<dbl> <dbl> <chr> <chr>
1 1 14 High Day1
2 2 10 High Day1
3 3 13 High Day1
4 4 14 High Day1
5 5 12 High Day1
6 6 8 High Day1
7 7 13 High Day1
8 8 11 High Day1
9 9 9 High Day1
10 10 12 High Day1
# ℹ 80 more rows
One thing to note is that Subject is being treated as a <dbl> number. It should be treated as a factor.
Since I have two factors here, one strategy may be to facet my spaghetti plots. In this case I am creating separate spaghetti plots for each level of ProcessDepth:
p <-ggplot(data = factorial_df, aes(x=Day, y=Recalled, group = Subject)) +geom_point(aes(col=Subject)) +geom_line(aes(col=Subject)) +facet_wrap(~ProcessDepth)plotly::ggplotly(p)
FWIW, on thing that this plot has clued me in on is that the order of my Processing Depth levels is High, Lo, Med. I might actually want that to be Lo, Med, High:
p <-ggplot(data = factorial_df, aes(x=Day, y=Recalled, group = Subject)) +geom_point(aes(col=Subject)) +geom_line(aes(col=Subject)) +facet_wrap(~ProcessDepth)plotly::ggplotly(p)
48.2.2 means plots
Again, we need to make the appropriate correction for our error bars. In this case the number of within groups is the number of cells that is formed by crossing Day (1,2,3) and Processing Depth (Lo, Med, Hi). As this is a 3 * 3 design there are 9 cells, or nWithinGroups=9. As before this is handled auto-magically in summarySEwithin2.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
48.2.3 rain cloud plot
Here’s how we might construct a rain cloud plot for this. For what’s its worth, this took a lot of trial and error to get the positioning right. I’d only do this for plots you intend to share with the world. Also note, I’m including the viridis package here (including scale_fill_viridis). This is a package that provides a color palette that is colorblind friendly. See here for more information.
omnibus_aov <- afex::aov_ez(id ="Subject", dv ="Recalled", data = factorial_df,within =c("ProcessDepth","Day") )# alternative using aov_car# omnibus_aov <- afex::aov_car(Recalled ~ ProcessDepth * Day + Error(Subject/(ProcessDepth*Day)),# data = factorial_df)
Here we have two main effects and an interaction. Let’s unpack the interaction by taking a look at whether Recall increases over successive days.
48.4 running the simple effects ANOVAs
The Cohen chapter assigned this week recommends that one avoid the use of pooled error terms when performing simple effects analysis of within-subjects ANOVA. This is definitely the case when the spherecity assumption is violated, but as a general rule, even when the assumption is not technically violated, deviations from spherecity can artificially inflate Type I error. So in this case, simple follow up ANOVAs that refer to their own error terms are justified. That said as Type I error goes down, Type II goes up. Something to consider.
Something else to consider—remember this little tidbit from earlier:
[…] for within-subjects (or repeated measures) effects, the error term is the Treatment x Subjects interaction, and the nature of the TxS interaction across all treatment levels can be very different than it is for any particular pair of treatment levels.
This applies to simple effects ANOVA as well.
This can be accomplished using joint_tests() from emmeans telling it how we want to separate our original model. As before, we need to add model="multivariate" to our call (this tells R to use the simple effects error term and not the omnibus).
joint_tests(omnibus_aov, by ="ProcessDepth", model="multivariate")
ProcessDepth = Lo:
model term df1 df2 F.ratio p.value
Day 2 9 1.487 0.2768
ProcessDepth = Med:
model term df1 df2 F.ratio p.value
Day 2 9 30.146 0.0001
ProcessDepth = High:
model term df1 df2 F.ratio p.value
Day 2 9 32.045 0.0001
48.4.0.1 a note on Univariate v. Multivariate methods
You might be asking yourself “why multivariate in the model argument above?” Briefly, univariate v. multivariate statistical methods refer to the number of dependent variables associated with the model per fundamental unit of analysis. In psychology research the fundamental unit of analysis is typically the participant—measures are taken at the level of the participant. A univariate model has one dependent variable whereas a multivariate method has two or more per participant. Perhaps some of you are more familiar with this distinction in the context on One-way MANOVA… which analyzes two or more dependent variables simultaneously in an ANOVA design. MANOVA (multivariate ANOVA) is sometimes preferred to running separate univariate ANOVAs if one believes that the dependent variables involved are somehow correlated with one another. This is partly because MANOVA doesn’t rely on the assumption of Sphericity but does consider the covariance among the measures. In this case MANOVA factors these correlations into the model testing the IVs.
I bring all of this up as one way to conceptualize repeated-measures ANOVA is as multivariate method. Multiple measures are taken at the level of the participant. Specifying multivariate tells R to consider this case and therefore derive the error term and degrees of freedom from a Multivariate Analysis of Variance (MANOVA). In MANOVA, your degrees of freedom are reduced and it tends to be a less powerful statistical method overall, leading to a higher likelihood of Type II errors. Applying MANOVA to a dataset, one typically obtains the Pillai’s Trace statistic, ranging from 0 to 1, indicating which term contributes more significantly to the overall model. Additionally, MANOVA provides an estimated F-value.
Contrasting approaches to running our data as an RM-ANOVA vs. a MANOVA. FWIW, you don’t need to explicitly call the MANOVA in your omnibus test. I’m only showing the MANOVA as a demonstration. In short, you can simply proceed as usual and specify “multivariate” in any follow-ups.
omnibus_aov <- afex::aov_ez(id ="Subject", dv ="Recalled", data = factorial_df,within =c("ProcessDepth","Day") )omnibus_aov
# alternative using aov_car# omnibus_aov <- afex::aov_car(Recalled ~ ProcessDepth * Day + Error(Subject/(ProcessDepth*Day)),# data = factorial_df)
omnibus_maov <- afex::aov_ez(id ="Subject", dv ="Recalled", data = factorial_df,within =c("ProcessDepth","Day"),return ="Anova" )omnibus_maov
Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
(Intercept) 1 0.99697 2957.15 1 9 1.209e-12 ***
ProcessDepth 1 0.94785 72.70 2 8 7.396e-06 ***
Day 1 0.89993 35.97 2 8 0.0001003 ***
ProcessDepth:Day 1 0.88596 11.65 4 6 0.0054256 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that the p-values for our tests have increased (though still significant) indicating the loss in power.
48.4.1 post-hoc tests
Given that we find an effect for Day on the High and MedProcessDepth we need to perform post-hoc follow-ups. Again let’s specify model = "multivariate":
post_hoc_tests <-emmeans(omnibus_aov, by ="ProcessDepth", specs =~ Day, model="multivariate", adjust="tukey")post_hoc_tests %>%pairs(adjust="tukey")
Remember from this table we should only focus on ProcessDepth = Med and ProcessDepth = High.
48.5 Writing this up
… To test this hypothesis we ran a 3 (Process Depth) × 3 (Day) repeated-measures ANOVA. Where there were violations of spherecity, we used Greenhouse-Geisser corrected degrees of freedom. This analysis revealed a significant interaction, \(F\)(2.86, 25.78) = 10.64, \(p\) < .001, \(\eta_p^2\) = .54. To test this interaction we ran seperate simple effects ANOVA for Day on each level of Process Depth. A simple effect for Day was found on High processing depth, \(F\)(2, 9) = 32.05, \(p\) < .001, where there were statistically significant increases in performance from each Day to the next (Tukey HSD, \(p\) < .05). A simple effect for Day was also found on Med processing depth, \(F\)(2, 9) = 30.15, \(p\) < .001. While scores increased across the 3 days, only Day 3 was significantly greater than the other two (\(p\)s < .05). No simple effect was observed for Day in the Lo processing depth condition.
See the next walkthrough for how to do this in SPSS.
48.6 Epilouge: throwing caution to the wind (yikes)
48.6.1 Assuming spherecity (not recommended)
What if we said, “screw it” and we wanted to throw Cohen’s caution to the wind and use the error terms from the omnibus ANOVA. After all, this gives us more power, pow-ah, pow-ahhhh!
Well given how we’ve specificied the model, we would need to do two things. First, we would need to re-build our model, specifying that we desire the output to include an aov object (an annoying but necessary step here):
omnibus_aov <- afex::aov_ez(id ="Subject", dv ="Recalled", data = factorial_df,within =c("ProcessDepth","Day"),include_aov =TRUE )# alternative using aov_car# omnibus_aov <- afex::aov_car(Recalled ~ ProcessDepth * Day + Error(Subject/(ProcessDepth*Day)),# data = factorial_df, # include_aov = TRUE)
Then would simply run the simple effects tests with a univariate model. Note that this method attempts to deal with potential violations of our spherecity assumptions by using Satterthwaite degrees of freedom. Keep in mind, we are less protected from Type 1 error in this case.
joint_tests(omnibus_aov, by ="ProcessDepth", model="univariate")
ProcessDepth = Lo:
model term df1 df2 F.ratio p.value
Day 2 47.6 0.581 0.5635
ProcessDepth = Med:
model term df1 df2 F.ratio p.value
Day 2 47.6 11.049 0.0001
ProcessDepth = High:
model term df1 df2 F.ratio p.value
Day 2 47.6 40.457 <.0001
As a follow up we would run the pairwise tests using the univariate model (pooled error term):