48 Factorial RM-ANOVA and simple effects

Let’s ramp-up the complexity. Here is a dataset testing Recall over three days as a function of depth of processing (Lo, Med, High). In this case we have two within factors.

48.1 loading in the necessary packages and data:

pacman::p_load(tidyverse,
               afex,
               cowplot,
               emmeans)

source("https://gist.githubusercontent.com/hauselin/a83b6d2f05b0c90c0428017455f73744/raw/38e03ea4bf658d913cf11f4f1c18a1c328265a71/summarySEwithin2.R")

factorial_df <- read_csv("http://tehrandav.is/courses/statistics/practice_datasets/withinEx3.csv")

Rows: 90 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): ProcessDepth, Day
dbl (2): Subject, Recalled

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

factorial_df

# A tibble: 90 × 4
   Subject Recalled ProcessDepth Day  
     <dbl>    <dbl> <chr>        <chr>
 1       1       14 High         Day1 
 2       2       10 High         Day1 
 3       3       13 High         Day1 
 4       4       14 High         Day1 
 5       5       12 High         Day1 
 6       6        8 High         Day1 
 7       7       13 High         Day1 
 8       8       11 High         Day1 
 9       9        9 High         Day1 
10      10       12 High         Day1 
# ℹ 80 more rows

One thing to note is that Subject is being treated as a <dbl> number. It should be treated as a factor.

factorial_df$Subject <-  as.factor(factorial_df$Subject)

48.2 plotting the data

48.2.1 individual subjects plots: Spaghetti

Since I have two factors here, one strategy may be to facet my spaghetti plots. In this case I am creating separate spaghetti plots for each level of ProcessDepth:

p <- ggplot(data = factorial_df, aes(x=Day, y=Recalled, group = Subject)) +
  geom_point(aes(col=Subject)) + 
  geom_line(aes(col=Subject)) +
  facet_wrap(~ProcessDepth)

plotly::ggplotly(p)

FWIW, on thing that this plot has clued me in on is that the order of my Processing Depth levels is High, Lo, Med. I might actually want that to be Lo, Med, High:

factorial_df$ProcessDepth <- fct_relevel(factorial_df$ProcessDepth, 
                                     c("Lo","Med","High")
                              )

and now re-plotting…

p <- ggplot(data = factorial_df, aes(x=Day, y=Recalled, group = Subject)) +
  geom_point(aes(col=Subject)) + 
  geom_line(aes(col=Subject)) +
  facet_wrap(~ProcessDepth)

plotly::ggplotly(p)

48.2.2 means plots

Again, we need to make the appropriate correction for our error bars. In this case the number of within groups is the number of cells that is formed by crossing Day (1,2,3) and Processing Depth (Lo, Med, Hi). As this is a 3 * 3 design there are 9 cells, or nWithinGroups=9. As before this is handled auto-magically in summarySEwithin2.

plot_table <- summarySEwithin2(data=factorial_df,
                                  measurevar = "Recalled",
                                  withinvars = c("Day","ProcessDepth"),
                                  idvar = "Subject")

Automatically converting the following non-factors to factors: Day


Attaching package: 'data.table'

The following objects are masked from 'package:lubridate':

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year

The following objects are masked from 'package:dplyr':

    between, first, last

The following object is masked from 'package:purrr':

    transpose

ggplot(plot_table, aes(x = Day, y = Recalled, group=ProcessDepth)) +
  geom_point(size=2.5, aes(shape = ProcessDepth)) + 
  geom_line(size=1, aes(linetype=ProcessDepth)) +
  geom_errorbar(aes(ymin=Recalled-se, ymax=Recalled+se), width=0.1) +
  theme_cowplot() +
  theme(
    axis.title = element_text(size = 16, face = "bold", lineheight = .55),
    axis.text = element_text(size = 12),
    legend.title = element_text(size = 12, face = "bold"),
    legend.position = c(.25,.75)) +
  scale_color_manual(values=c("black","grey50")) + 
  xlab("Lecture") + 
  ylab ("Score") +
  theme(plot.margin=unit(c(.25,.25,.25,.25),"in")) + 
  # stack legend boxes horizontally:
  theme(legend.box = "horizontal")

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

48.2.3 rain cloud plot

Here’s how we might construct a rain cloud plot for this. For what’s its worth, this took a lot of trial and error to get the positioning right. I’d only do this for plots you intend to share with the world. Also note, I’m including the viridis package here (including scale_fill_viridis). This is a package that provides a color palette that is colorblind friendly. See here for more information.

pacman::p_load(ggrain, ggpp, sdamr, viridis)

ggplot(data = factorial_df, aes(x = Day, y=Recalled, fill = ProcessDepth, color = ProcessDepth)) +
  geom_rain(
            violin.args = list(alpha = .3 ),
            violin.args.pos = list(
              position = ggpp::position_jitternudge(width = .0, x = .2),
              side = "r"),
            boxplot.args = list(outlier.shape = NA, color = "black", width = .25),
            boxplot.args.pos = list(
              position = ggpp::position_dodgenudge(x = 0)),
            point.args = list(alpha = .3),
            point.args.pos = list(
              position = sdamr::position_jitternudge(jitter.width = .1, nudge.x = -.25)
              )
            ) +
  theme_cowplot() +
  scale_fill_viridis(discrete = T) +
  guides(fill = "none")

48.3 running the omnibus ANOVA:

omnibus_aov <- afex::aov_ez(id = "Subject", 
                            dv = "Recalled", 
                            data = factorial_df,
                            within = c("ProcessDepth","Day")
                            )
                            

# alternative using aov_car

# omnibus_aov <- afex::aov_car(Recalled ~ ProcessDepth * Day + Error(Subject/(ProcessDepth*Day)),
#                              data = factorial_df)

And now to check for spherecity…

performance::check_sphericity(omnibus_aov) %>% print()

Warning in summary.Anova.mlm(object$Anova, multivariate = FALSE): HF eps > 1
treated as 1

OK: Data seems to be spherical (p > 0.273).

Everything seems ok so no correction necessary

anova(omnibus_aov, es = "pes", correction = "none")

Anova Table (Type 3 tests)

Response: Recalled
                 num Df den Df    MSE      F     pes    Pr(>F)    
ProcessDepth          2     18 3.4111 63.345 0.87560 7.137e-09 ***
Day                   2     18 5.3185 23.925 0.72665 8.521e-06 ***
ProcessDepth:Day      4     36 2.5944 10.636 0.54166 8.567e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we have two main effects and an interaction. Let’s unpack the interaction by taking a look at whether Recall increases over successive days.

48.4 running the simple effects ANOVAs

The Cohen chapter assigned this week recommends that one avoid the use of pooled error terms when performing simple effects analysis of within-subjects ANOVA. This is definitely the case when the spherecity assumption is violated, but as a general rule, even when the assumption is not technically violated, deviations from spherecity can artificially inflate Type I error. So in this case, simple follow up ANOVAs that refer to their own error terms are justified. That said as Type I error goes down, Type II goes up. Something to consider.

Something else to consider—remember this little tidbit from earlier:

[…] for within-subjects (or repeated measures) effects, the error term is the Treatment x Subjects interaction, and the nature of the TxS interaction across all treatment levels can be very different than it is for any particular pair of treatment levels.

This applies to simple effects ANOVA as well.

This can be accomplished using joint_tests() from emmeans telling it how we want to separate our original model. As before, we need to add model="multivariate" to our call (this tells R to use the simple effects error term and not the omnibus).

joint_tests(omnibus_aov, by = "ProcessDepth", model="multivariate")

ProcessDepth = Lo:
 model term df1 df2 F.ratio p.value
 Day          2   9   1.487  0.2768

ProcessDepth = Med:
 model term df1 df2 F.ratio p.value
 Day          2   9  30.146  0.0001

ProcessDepth = High:
 model term df1 df2 F.ratio p.value
 Day          2   9  32.045  0.0001

48.4.0.1 a note on Univariate v. Multivariate methods

You might be asking yourself “why multivariate in the model argument above?” Briefly, univariate v. multivariate statistical methods refer to the number of dependent variables associated with the model per fundamental unit of analysis. In psychology research the fundamental unit of analysis is typically the participant—measures are taken at the level of the participant. A univariate model has one dependent variable whereas a multivariate method has two or more per participant. Perhaps some of you are more familiar with this distinction in the context on One-way MANOVA… which analyzes two or more dependent variables simultaneously in an ANOVA design. MANOVA (multivariate ANOVA) is sometimes preferred to running separate univariate ANOVAs if one believes that the dependent variables involved are somehow correlated with one another. This is partly because MANOVA doesn’t rely on the assumption of Sphericity but does consider the covariance among the measures. In this case MANOVA factors these correlations into the model testing the IVs.

I bring all of this up as one way to conceptualize repeated-measures ANOVA is as multivariate method. Multiple measures are taken at the level of the participant. Specifying multivariate tells R to consider this case and therefore derive the error term and degrees of freedom from a Multivariate Analysis of Variance (MANOVA). In MANOVA, your degrees of freedom are reduced and it tends to be a less powerful statistical method overall, leading to a higher likelihood of Type II errors. Applying MANOVA to a dataset, one typically obtains the Pillai’s Trace statistic, ranging from 0 to 1, indicating which term contributes more significantly to the overall model. Additionally, MANOVA provides an estimated F-value.

Contrasting approaches to running our data as an RM-ANOVA vs. a MANOVA. FWIW, you don’t need to explicitly call the MANOVA in your omnibus test. I’m only showing the MANOVA as a demonstration. In short, you can simply proceed as usual and specify “multivariate” in any follow-ups.

omnibus_aov <- afex::aov_ez(id = "Subject", 
                            dv = "Recalled", 
                            data = factorial_df,
                            within = c("ProcessDepth","Day")
                            )

omnibus_aov

Anova Table (Type 3 tests)

Response: Recalled
            Effect          df  MSE         F  ges p.value
1     ProcessDepth 1.90, 17.12 3.59 63.35 *** .597   <.001
2              Day 1.69, 15.20 6.30 23.92 *** .466   <.001
3 ProcessDepth:Day 2.86, 25.78 3.62 10.64 *** .275   <.001
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Sphericity correction method: GG

# alternative using aov_car

# omnibus_aov <- afex::aov_car(Recalled ~ ProcessDepth * Day + Error(Subject/(ProcessDepth*Day)),
#                              data = factorial_df)

omnibus_maov <- afex::aov_ez(id = "Subject", 
                            dv = "Recalled", 
                            data = factorial_df,
                            within = c("ProcessDepth","Day"),
                            return = "Anova"
                            )

omnibus_maov


Type III Repeated Measures MANOVA Tests: Pillai test statistic
                 Df test stat approx F num Df den Df    Pr(>F)    
(Intercept)       1   0.99697  2957.15      1      9 1.209e-12 ***
ProcessDepth      1   0.94785    72.70      2      8 7.396e-06 ***
Day               1   0.89993    35.97      2      8 0.0001003 ***
ProcessDepth:Day  1   0.88596    11.65      4      6 0.0054256 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see that the p-values for our tests have increased (though still significant) indicating the loss in power.

48.4.1 post-hoc tests

Given that we find an effect for Day on the High and Med ProcessDepth we need to perform post-hoc follow-ups. Again let’s specify model = "multivariate":

post_hoc_tests <- emmeans(omnibus_aov, 
        by = "ProcessDepth", 
        specs = ~ Day, 
        model="multivariate", 
        adjust="tukey")

post_hoc_tests %>% pairs(adjust="tukey")

ProcessDepth = Lo:
 contrast    estimate    SE df t.ratio p.value
 Day1 - Day2     -0.4 0.819  9  -0.488  0.8785
 Day1 - Day3     -0.9 0.547  9  -1.646  0.2768
 Day2 - Day3     -0.5 0.687  9  -0.728  0.7539

ProcessDepth = Med:
 contrast    estimate    SE df t.ratio p.value
 Day1 - Day2     -1.5 1.035  9  -1.449  0.3585
 Day1 - Day3     -3.9 0.547  9  -7.134  0.0001
 Day2 - Day3     -2.4 0.859  9  -2.794  0.0498

ProcessDepth = High:
 contrast    estimate    SE df t.ratio p.value
 Day1 - Day2     -2.5 0.833  9  -3.000  0.0361
 Day1 - Day3     -7.4 0.933  9  -7.929  0.0001
 Day2 - Day3     -4.9 1.090  9  -4.496  0.0038

P value adjustment: tukey method for comparing a family of 3 estimates

Remember from this table we should only focus on ProcessDepth = Med and ProcessDepth = High.

48.5 Writing this up

… To test this hypothesis we ran a 3 (Process Depth) × 3 (Day) repeated-measures ANOVA. Where there were violations of spherecity, we used Greenhouse-Geisser corrected degrees of freedom. This analysis revealed a significant interaction, \(F\)(2.86, 25.78) = 10.64, \(p\) < .001, \(\eta_p^2\) = .54. To test this interaction we ran seperate simple effects ANOVA for Day on each level of Process Depth. A simple effect for Day was found on High processing depth, \(F\)(2, 9) = 32.05, \(p\) < .001, where there were statistically significant increases in performance from each Day to the next (Tukey HSD, \(p\) < .05). A simple effect for Day was also found on Med processing depth, \(F\)(2, 9) = 30.15, \(p\) < .001. While scores increased across the 3 days, only Day 3 was significantly greater than the other two (\(p\)s < .05). No simple effect was observed for Day in the Lo processing depth condition.

See the next walkthrough for how to do this in SPSS.

48.6 Epilouge: throwing caution to the wind (yikes)

48.6.1 Assuming spherecity (not recommended)

What if we said, “screw it” and we wanted to throw Cohen’s caution to the wind and use the error terms from the omnibus ANOVA. After all, this gives us more power, pow-ah, pow-ahhhh!

Well given how we’ve specificied the model, we would need to do two things. First, we would need to re-build our model, specifying that we desire the output to include an aov object (an annoying but necessary step here):

omnibus_aov <- afex::aov_ez(id = "Subject", 
                            dv = "Recalled", 
                            data = factorial_df,
                            within = c("ProcessDepth","Day"),
                            include_aov = TRUE
                            )

# alternative using aov_car

# omnibus_aov <- afex::aov_car(Recalled ~ ProcessDepth * Day + Error(Subject/(ProcessDepth*Day)),
#                              data = factorial_df, 
#                              include_aov = TRUE)

Then would simply run the simple effects tests with a univariate model. Note that this method attempts to deal with potential violations of our spherecity assumptions by using Satterthwaite degrees of freedom. Keep in mind, we are less protected from Type 1 error in this case.

joint_tests(omnibus_aov, by = "ProcessDepth", model="univariate")

ProcessDepth = Lo:
 model term df1  df2 F.ratio p.value
 Day          2 47.6   0.581  0.5635

ProcessDepth = Med:
 model term df1  df2 F.ratio p.value
 Day          2 47.6  11.049  0.0001

ProcessDepth = High:
 model term df1  df2 F.ratio p.value
 Day          2 47.6  40.457  <.0001

As a follow up we would run the pairwise tests using the univariate model (pooled error term):

post_hoc_tests <- emmeans(omnibus_aov, 
        by = "ProcessDepth", 
        specs = pairwise ~ Day, 
        model="univariate", 
        adjust="tukey")

post_hoc_tests %>% pairs(adjust="tukey")

ProcessDepth = Lo:
 contrast    estimate    SE   df t.ratio p.value
 Day1 - Day2     -0.4 0.837 47.6  -0.478  0.8820
 Day1 - Day3     -0.9 0.837 47.6  -1.075  0.5339
 Day2 - Day3     -0.5 0.837 47.6  -0.597  0.8222

ProcessDepth = Med:
 contrast    estimate    SE   df t.ratio p.value
 Day1 - Day2     -1.5 0.837 47.6  -1.792  0.1831
 Day1 - Day3     -3.9 0.837 47.6  -4.660  0.0001
 Day2 - Day3     -2.4 0.837 47.6  -2.868  0.0166

ProcessDepth = High:
 contrast    estimate    SE   df t.ratio p.value
 Day1 - Day2     -2.5 0.837 47.6  -2.987  0.0121
 Day1 - Day3     -7.4 0.837 47.6  -8.842  <.0001
 Day2 - Day3     -4.9 0.837 47.6  -5.855  <.0001

P value adjustment: tukey method for comparing a family of 3 estimates

That said, I would not recommend using this method. More on this in the next walkthrough.