::p_load(tidyverse, rstatix, afex) pacman
45 afex is for ANOVA
So far, we have been running our ANOVA using either lm()
or aov()
, which is just a wrapper for lm()
. This has been done for pedagogical reasons: (1) to re-enforce that ANOVA is just a variant of linear regression and (2) having you write out the formula gives you a better idea of how the model is specified (i.e. Scores as a function of Lecture, Presentation, and their interaction = Scores~Lecture + Presentation + Lecture * Presentation
).
That said, there are cleaner tools for running simple ANOVA in R. For example, afex
offers several clean and intuitive alternatives for running ANOVA. In fact, afex
might be preferable, as the output gives you most everything you need to report your ANOVA, without needing to pipe the model into rstatix::anova_test
, and as we enter into Repeated Measures ANOVA, afex
methods make the necessary corrections for you if your data violates certain assumptions needed for ANOVA.
Indeed, while I have been stressing the lm()
method to re-enforce that ANOVA and linear regression are two sides of the same coin, when performing a simple ANOVA I hardly ever use this method (you can hate me now / but I won’t stop now). I typically use afex::aov_car()
unless I am planning some very complex comparisons or analyses related to my model or afex::``aov_ez()
. In this walkthrough, I’ll introduce you to both—which you choose may depend on your own personal preference.
Why the switch-up? Well, lm()
methods don’t allow you to easily specify a repeated measures model (it can be done, but can quickly become quite unwieldy). Remember my anectdote about my stats friend that told me that “there’s no such thing as a repeated measures ANOA, there is just an ANOVA where you have violated your independence assumption and you are trying to correct for it?”. Well that essentially applies here. If you were to use lm()
you would need to build in your model specification.
In what follows we will take a look at how to run ANOVA using two afex
methods as an alternative to lm()
. This will become critical, as when we get to running Repeated Measures ANOVA, I do not recommend using lm()
(not that its wrong… its just more difficult than necessary.)
45.1 Getting the data:
To start, let’s grab (and clean) some familiar data (this is from the Factorial ANOVA walkthrough). Note that I am cleaning up the numeric coding to reflect my factor levels
<- read_delim("https://www.uvm.edu/~statdhtx/methods8/DataFiles/Sec13-5.dat", "\t", escape_double = FALSE, trim_ws = TRUE) dataset
Rows: 135 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
dbl (4): Task, Smkgrp, score, covar
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
$Task <- recode_factor(dataset$Task, "1" = "Pattern Recognition", "2" = "Cognitive", "3" = "Driving Simulation")
dataset$Smkgrp <- recode_factor(dataset$Smkgrp, "1" = "Nonsmoking", "2" = "Delayed", "3" = "Active") dataset
45.2 Hey participant! You got any ID?!?
When I introduced ANOVA a few weeks back, you may remember that I stressed that it would be a good idea to ensure that your dataset contained a column that assigned each unique participant a unique ID. For the first few weeks of ANOVA, this was simply good practice, but ultimately not necessary. This is no longer the case. Moving forward, when running ANOVA that has a repeated measures / within-subjects design, you need to have a participant ID column, in order for R
to match the appropriate data. That is if Johnny is has four scores in the data set (e.g., Johnny’s score on Monday, Tues, Thurs, and Fri) R
uses the participant ID to match each of those scores with one another. This is because, when running a repeated measures design, the ANOVA needs to take into account within-subjects correlations of scores (i.e., Johnny’s scores are more tightly correlated with one another than they are with Jenny’s scores).
For now we’ll simply create a column that assigns each row / score a unique participant ID. This would be consistent with the between-subjects designs we have encountered up to this point. However, note that if R
finds instances where PartID
are the same it will assume that data comes from the same participant (i.e., we have a within-subjects or repeated measures design). For our purposes now, this is easily solved by simply creating a column that runs from 1 to the number of observations in our data. After that we can proceed with the ANOVA.
# create a PartID column and number, every score comes from a unique person
$PartID <-
datasetseq_along(dataset$score) %>%
as.factor()
45.3 running an ANOVA using the afex
package:
Below, I’ll highlight TWO different ways of executing an ANOVA using afex
. The first is aov_ez
which is drastically different way of inputting the model terms than we are used to. That said, many prefer it, as it really spells out for you what you should enter for each parameter.
Recall that using lm()
we specified our model as:
<- lm(score~ Task * Smkgrp, dataset)
aov_model %>% rstatix::anova_test(effect.size = "ges", type = 3) aov_model
ANOVA Table (type III tests)
Effect DFn DFd F p p<.05 ges
1 Task 2 126 26.265 2.92e-10 * 0.29400
2 Smkgrp 2 126 0.010 9.90e-01 0.00016
3 Task:Smkgrp 4 126 2.943 2.30e-02 * 0.08500
45.3.1 using aov_ez()
To run the same model using aov_ez()
we:
<- afex::aov_ez(
aov_model id = "PartID", # column identifying your participants
dv = "score", # column that contains your dv
data = dataset, # your data frame
between = c("Task", "Smkgrp"), # column(s) containing your between IV(s)
within = NULL, # column(s) containing your within IV(s)
type = 3 # type of Sum of Squares (note SPSS does 3)
# return = "afex_aov",
# covariate = NULL, # column(s) containing your covariate(s)
# observed = NULL,
# fun_aggregate = NULL,
# factorize = TRUE,
# check_contrasts = TRUE,
# anova_table = list(es = "pes")
# adding partial eta squared effect size )
Contrasts set to contr.sum for the following variables: Task, Smkgrp
Note that above I specified every argument in the function. This was not necessary (indeed if you use Rstudio to help you fill this in, you’ll notice that many of the defaults are NULL). I also didn’t need to specify within = NULL
but wanted to highlight this is where within-IVs go.
Briefly running down each of the arguments for aov_ez()
:
id
: column identifying your participantsdv
: column that contains your dvdata
: your data framebetween
: column(s) containing your between IV(s)within
: column(s) containing your within IV(s)return
: what kind of object do you want to return? e.g.,lm
,aov
. The defaultafex_aov
is typically preferable.covariate
: column(s) containing your covariate(s) for ANCOVAobserved
: column of any variables are observed (i.e, measured) but not experimentally manipulated (typically NULL)fun_aggregate
: function for aggregating the data before running the ANOVA, defaults to meantype
: type of Sum of Squares, see the Field text on when to change this valuefactorize
: should between subject factors be factorized before running the analysis. This is useful if your data is still numerically coded. This needs to be set to false if you have a covariatecheck_contrasts
: should between-subject factors be checked and (if necessary) changed to be “contr.sum”, usually leave this asTRUE
anova_table
: list of further arguments passed to function producing the ANOVA table including what kind of effect size (partial v. general eta squared)? and any corrections (e.g., bonferroni). See theafex
documentation for more details.
For now I want to comment on a few of the choices I made.
Let’s take at the aov_model
we just created by sending it to anova()
.
%>% anova(es = "ges") aov_model
Anova Table (Type 3 tests)
Response: score
num Df den Df MSE F ges Pr(>F)
Task 2 126 107.83 132.8954 0.67840 < 2.2e-16 ***
Smkgrp 2 126 107.83 8.4098 0.11777 0.000373 ***
Task:Smkgrp 4 126 107.83 2.9430 0.08545 0.022972 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
45.3.2 using aov_car
We can also specify the ANOVA model using aov_car()
. This method is very similar to how we have been specifying models thus far. One advantage of this method is that it involves less typing. The new wrinkle is that you MUST specify the structure of your Error
term. In between ANOVA (as we have done up to this point) is simply involves adding + Error(PartID)
to our model formula:
<-
aov_model aov_car(data = dataset,
~ Smkgrp * Task +
score Error(PartID))
Contrasts set to contr.sum for the following variables: Smkgrp, Task
%>% anova(es = "ges") aov_model
Anova Table (Type 3 tests)
Response: score
num Df den Df MSE F ges Pr(>F)
Smkgrp 2 126 107.83 8.4098 0.11777 0.000373 ***
Task 2 126 107.83 132.8954 0.67840 < 2.2e-16 ***
Smkgrp:Task 4 126 107.83 2.9430 0.08545 0.022972 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
45.4 doing more with afex
The afex
package can be used for much more that simple running ANOVA. For example it’s got its own plotting functions and can even be used for multi-level modeling (although I typically on use it for quick and dirty modeling). Much of this is outside the scope of this class, but if you have time I certainly recommend you check out https://github.com/singmann/afex.