45 afex is for ANOVA

pacman::p_load(tidyverse, rstatix, afex)

So far, we have been running our ANOVA using either lm() or aov(), which is just a wrapper for lm(). This has been done for pedagogical reasons: (1) to re-enforce that ANOVA is just a variant of linear regression and (2) having you write out the formula gives you a better idea of how the model is specified (i.e. Scores as a function of Lecture, Presentation, and their interaction = Scores~Lecture + Presentation + Lecture * Presentation).

That said, there are cleaner tools for running simple ANOVA in R. For example, afex offers several clean and intuitive alternatives for running ANOVA. In fact, afex might be preferable, as the output gives you most everything you need to report your ANOVA, without needing to pipe the model into rstatix::anova_test, and as we enter into Repeated Measures ANOVA, afex methods make the necessary corrections for you if your data violates certain assumptions needed for ANOVA.

Indeed, while I have been stressing the lm() method to re-enforce that ANOVA and linear regression are two sides of the same coin, when performing a simple ANOVA I hardly ever use this method (you can hate me now / but I won’t stop now). I typically use afex::aov_car() unless I am planning some very complex comparisons or analyses related to my model or afex::``aov_ez(). In this walkthrough, I’ll introduce you to both—which you choose may depend on your own personal preference.

Why the switch-up? Well, lm() methods don’t allow you to easily specify a repeated measures model (it can be done, but can quickly become quite unwieldy). Remember my anectdote about my stats friend that told me that “there’s no such thing as a repeated measures ANOA, there is just an ANOVA where you have violated your independence assumption and you are trying to correct for it?”. Well that essentially applies here. If you were to use lm() you would need to build in your model specification.

In what follows we will take a look at how to run ANOVA using two afex methods as an alternative to lm(). This will become critical, as when we get to running Repeated Measures ANOVA, I do not recommend using lm() (not that its wrong… its just more difficult than necessary.)

45.1 Getting the data:

To start, let’s grab (and clean) some familiar data (this is from the Factorial ANOVA walkthrough). Note that I am cleaning up the numeric coding to reflect my factor levels

dataset <- read_delim("https://www.uvm.edu/~statdhtx/methods8/DataFiles/Sec13-5.dat", "\t", escape_double = FALSE, trim_ws = TRUE)

Rows: 135 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
dbl (4): Task, Smkgrp, score, covar

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dataset$Task <- recode_factor(dataset$Task, "1" = "Pattern Recognition", "2" = "Cognitive", "3" = "Driving Simulation") 
dataset$Smkgrp <- recode_factor(dataset$Smkgrp, "1" = "Nonsmoking", "2" = "Delayed", "3" = "Active")

45.2 Hey participant! You got any ID?!?

When I introduced ANOVA a few weeks back, you may remember that I stressed that it would be a good idea to ensure that your dataset contained a column that assigned each unique participant a unique ID. For the first few weeks of ANOVA, this was simply good practice, but ultimately not necessary. This is no longer the case. Moving forward, when running ANOVA that has a repeated measures / within-subjects design, you need to have a participant ID column, in order for R to match the appropriate data. That is if Johnny is has four scores in the data set (e.g., Johnny’s score on Monday, Tues, Thurs, and Fri) R uses the participant ID to match each of those scores with one another. This is because, when running a repeated measures design, the ANOVA needs to take into account within-subjects correlations of scores (i.e., Johnny’s scores are more tightly correlated with one another than they are with Jenny’s scores).

For now we’ll simply create a column that assigns each row / score a unique participant ID. This would be consistent with the between-subjects designs we have encountered up to this point. However, note that if R finds instances where PartID are the same it will assume that data comes from the same participant (i.e., we have a within-subjects or repeated measures design). For our purposes now, this is easily solved by simply creating a column that runs from 1 to the number of observations in our data. After that we can proceed with the ANOVA.

# create a PartID column and number, every score comes from a unique person
dataset$PartID <- 
  seq_along(dataset$score) %>%
  as.factor()

45.3 running an ANOVA using the `afex` package:

Below, I’ll highlight TWO different ways of executing an ANOVA using afex. The first is aov_ez which is drastically different way of inputting the model terms than we are used to. That said, many prefer it, as it really spells out for you what you should enter for each parameter.

Recall that using lm() we specified our model as:

aov_model <- lm(score~ Task * Smkgrp, dataset)
aov_model %>% rstatix::anova_test(effect.size = "ges", type = 3)

ANOVA Table (type III tests)

       Effect DFn DFd      F        p p<.05     ges
1        Task   2 126 26.265 2.92e-10     * 0.29400
2      Smkgrp   2 126  0.010 9.90e-01       0.00016
3 Task:Smkgrp   4 126  2.943 2.30e-02     * 0.08500

45.3.1 using `aov_ez()`

To run the same model using aov_ez() we:

aov_model <- afex::aov_ez(
  id = "PartID", # column identifying your participants
  dv = "score", # column that contains your dv
  data = dataset, # your data frame
  between = c("Task", "Smkgrp"), # column(s) containing your between IV(s)
  within = NULL, # column(s) containing your within IV(s)
  type = 3 # type of Sum of Squares (note SPSS does 3)
  # return = "afex_aov", 
  # covariate = NULL, # column(s) containing your covariate(s)
  # observed = NULL,
  # fun_aggregate = NULL, 
  # factorize = TRUE, 
  # check_contrasts = TRUE,
  # anova_table = list(es = "pes")
  ) # adding partial eta squared effect size

Contrasts set to contr.sum for the following variables: Task, Smkgrp

Note that above I specified every argument in the function. This was not necessary (indeed if you use Rstudio to help you fill this in, you’ll notice that many of the defaults are NULL). I also didn’t need to specify within = NULL but wanted to highlight this is where within-IVs go.

Briefly running down each of the arguments for aov_ez():

id: column identifying your participants
dv: column that contains your dv
data: your data frame
between: column(s) containing your between IV(s)
within: column(s) containing your within IV(s)
return: what kind of object do you want to return? e.g., lm, aov. The default afex_aov is typically preferable.
covariate: column(s) containing your covariate(s) for ANCOVA
observed: column of any variables are observed (i.e, measured) but not experimentally manipulated (typically NULL)
fun_aggregate: function for aggregating the data before running the ANOVA, defaults to mean
type: type of Sum of Squares, see the Field text on when to change this value
factorize: should between subject factors be factorized before running the analysis. This is useful if your data is still numerically coded. This needs to be set to false if you have a covariate
check_contrasts: should between-subject factors be checked and (if necessary) changed to be “contr.sum”, usually leave this as TRUE
anova_table: list of further arguments passed to function producing the ANOVA table including what kind of effect size (partial v. general eta squared)? and any corrections (e.g., bonferroni). See the afex documentation for more details.

For now I want to comment on a few of the choices I made.

Let’s take at the aov_model we just created by sending it to anova().

aov_model %>% anova(es = "ges")

Anova Table (Type 3 tests)

Response: score
            num Df den Df    MSE        F     ges    Pr(>F)    
Task             2    126 107.83 132.8954 0.67840 < 2.2e-16 ***
Smkgrp           2    126 107.83   8.4098 0.11777  0.000373 ***
Task:Smkgrp      4    126 107.83   2.9430 0.08545  0.022972 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

45.3.2 using `aov_car`

We can also specify the ANOVA model using aov_car(). This method is very similar to how we have been specifying models thus far. One advantage of this method is that it involves less typing. The new wrinkle is that you MUST specify the structure of your Error term. In between ANOVA (as we have done up to this point) is simply involves adding + Error(PartID) to our model formula:

aov_model <- 
  aov_car(data = dataset,
          score ~ Smkgrp * Task + 
            Error(PartID))

Contrasts set to contr.sum for the following variables: Smkgrp, Task

aov_model %>% anova(es = "ges")

Anova Table (Type 3 tests)

Response: score
            num Df den Df    MSE        F     ges    Pr(>F)    
Smkgrp           2    126 107.83   8.4098 0.11777  0.000373 ***
Task             2    126 107.83 132.8954 0.67840 < 2.2e-16 ***
Smkgrp:Task      4    126 107.83   2.9430 0.08545  0.022972 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

45.4 doing more with `afex`

The afex package can be used for much more that simple running ANOVA. For example it’s got its own plotting functions and can even be used for multi-level modeling (although I typically on use it for quick and dirty modeling). Much of this is outside the scope of this class, but if you have time I certainly recommend you check out https://github.com/singmann/afex.

45.1 Getting the data:

45.2 Hey participant! You got any ID?!?

45.3 running an ANOVA using the afex package:

45.3.1 using aov_ez()

45.3.2 using aov_car

45.4 doing more with afex

45.3 running an ANOVA using the `afex` package:

45.3.1 using `aov_ez()`

45.3.2 using `aov_car`

45.4 doing more with `afex`