::p_load(rstatix,
pacman
cowplot,
tidyverse,
emmeans, performance)
41 Simple Example Analysis
At this point I’d like to pause to run through a simple example analysis step by step based upon some data we worked with in class. As with the last one of these, think of this as a tldr; version of the previous two walkthroughs.
41.1 Evaluating Exam Performance in Different Learning Environments
In the wake of evolving educational systems, a school district is keen to optimize learning experiences and academic performance. They have integrated both traditional classroom settings and online platforms for educational delivery. Additionally, they offer a range of curricula: Standard, Intermediate, and Advanced, to cater to students with varying academic abilities.
The educational board hypothesizes that classroom environment and curriculum type may have distinct impacts on students’ exam performance. To investigate this, they decide to conduct a study involving high school students from multiple schools within the district.
41.1.0.1 Objectives:
- To determine if a traditional classroom environment is more conducive to learning compared to an online setting, taking into account varying curricula.
- To assess if the level of curriculum—Standard, Intermediate, or Advanced—significantly affects exam scores, and whether this effect differs between traditional and online classrooms.
41.1.0.2 Participants:
Approximately 300 high school students participate in this study. Students are equally distributed among traditional and online settings and among the three curriculum types.
41.1.0.3 Methodology:
- Students are randomly assigned to one of the two classroom environments (Traditional or Online).
- Within each classroom environment, students are further assigned to one of the three types of curriculum (Standard, Intermediate, Advanced).
- At the end of the semester, students’ exam scores are recorded. The exam is standardized and is out of 100 points.
41.1.0.4 Data Collection:
- Classroom Environment: Traditional or Online
- Curriculum Type: Standard, Intermediate, Advanced
- Exam Score: Ranging from 0 to 100
41.1.0.5 Data Analysis Plan:
- Conduct a factorial ANOVA to investigate the main effects of classroom environment and curriculum type, as well as their interaction, on exam scores.
- If significant interactions are found, conduct post-hoc tests to understand the nature of these interactions.
41.1.0.6 Anticipated Outcomes:
The educational board is particularly interested in identifying whether certain combinations of classroom environment and curriculum type yield significantly higher exam scores, as this information could guide future educational strategies within the district.
Summary of design
Independent Variables:
Classroom Environment (Traditional, Online)
Curriculum Type (Standard, Intermediate, Advanced)
Dependent Variable:
- Exam Score (0-100)
41.2 Analysis
41.2.1 Dependencies
Packages:
Data
<- read_csv("http://tehrandav.is/courses/statistics/practice_datasets/learning_environments.csv") learning_df
Rows: 300 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): ClassEnv, Curriculum
dbl (1): ExamScore
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(learning_df)
spc_tbl_ [300 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ClassEnv : chr [1:300] "Traditional" "Traditional" "Traditional" "Traditional" ...
$ Curriculum: chr [1:300] "Standard" "Standard" "Standard" "Standard" ...
$ ExamScore : num [1:300] 57 76 78 56 63 67 77 73 80 76 ...
- attr(*, "spec")=
.. cols(
.. ClassEnv = col_character(),
.. Curriculum = col_character(),
.. ExamScore = col_double()
.. )
- attr(*, "problems")=<externalptr>
%>% head() learning_df
# A tibble: 6 × 3
ClassEnv Curriculum ExamScore
<chr> <chr> <dbl>
1 Traditional Standard 57
2 Traditional Standard 76
3 Traditional Standard 78
4 Traditional Standard 56
5 Traditional Standard 63
6 Traditional Standard 67
41.2.2 Descriptive Stats
Summary Table:
# table of cell means
%>%
learning_df group_by(ClassEnv,Curriculum) %>%
::get_summary_stats(type = "mean_se") rstatix
# A tibble: 6 × 6
ClassEnv Curriculum variable n mean se
<chr> <chr> <fct> <dbl> <dbl> <dbl>
1 Online Advanced ExamScore 50 85.5 1.38
2 Online Intermediate ExamScore 50 71.6 1.35
3 Online Standard ExamScore 50 63.9 1.41
4 Traditional Advanced ExamScore 50 78.7 1.35
5 Traditional Intermediate ExamScore 50 75.9 1.35
6 Traditional Standard ExamScore 50 71.5 1.49
Figure
ggplot(learning_df, aes(x = ClassEnv, y = ExamScore, group = Curriculum)) +
stat_summary(fun.data = "mean_se", geom = "pointrange", aes(shape = Curriculum)) +
stat_summary(fun = "mean", geom = "line", aes(linetype = Curriculum)) +
theme_cowplot()
41.2.3 Build the model
In the past I’ve recommended lm
to reinforce that this is just another linear model. Moving forward I’m going to emphasize using aov
which as I mentioned before is just a fancy wrapper for lm
. Using aov
provides the most consistent and efficient path for running ANOVA using the frameworks we’ve employed to this point.
<- aov(ExamScore ~ ClassEnv + Curriculum, data = learning_df) learning_model
41.2.4 Test model assumptions
::check_normality(learning_model) performance
OK: residuals appear as normally distributed (p = 0.523).
::check_homogeneity(learning_model) performance
OK: There is not clear evidence for different variances across groups (Bartlett Test, p = 0.981).
41.2.5 Perform ANOVA
anova_test(learning_model, effect.size = "pes")
ANOVA Table (type II tests)
Effect DFn DFd F p p<.05 pes
1 ClassEnv 1 296 2.104 1.48e-01 0.007
2 Curriculum 2 296 49.190 3.60e-19 * 0.249
41.2.5.1 No Main effect for ClassEnv
- nothing to see here, moving on.
41.2.5.2 Main effect for Curriculum
has three levels. Need to run a post hoc test
<- emmeans(learning_model, specs = ~Curriculum) emm_mdl %>% pairs() emm_mdl
contrast estimate SE df t.ratio p.value Advanced - Intermediate 8.34 1.45 296 5.743 <.0001 Advanced - Standard 14.34 1.45 296 9.875 <.0001 Intermediate - Standard 6.00 1.45 296 4.132 0.0001 Results are averaged over the levels of: ClassEnv P value adjustment: tukey method for comparing a family of 3 estimates
means table for curriculum
%>% learning_df group_by(Curriculum) %>% get_summary_stats(type = "mean_se")
# A tibble: 3 × 5 Curriculum variable n mean se <chr> <fct> <dbl> <dbl> <dbl> 1 Advanced ExamScore 100 82.1 1.02 2 Intermediate ExamScore 100 73.7 0.974 3 Standard ExamScore 100 67.7 1.09
41.3 Conclusions / Write-up
We hypothesized students exam performance would be influenced by both the classroom environment and the type of curriculum. To test these hypotheses we submitted exam scores to a 2 (Environment) \(\times\) 3 (Curriculum) ANOVA. The ANOVA revealed a significant main effect for Curriculum, \(F\) (2, 296) = 49.19, \(p\) < .001, \(\eta_p^2\) = .25. As seen in Figure 1 (below), Tukey post-hoc comparisions revealed significant differences between the three curriculum types ($ps$ < .05). Students’ exam scores were higher for those that took the Advanced curriculum (\(M±SE\): 82.1 ± 1.0) compared to the Intermediate (73.7 ± 1.0) and Standard (67.7 ± 1.1) conditions.
ggplot(learning_df, aes(x = ClassEnv, y = ExamScore, group = Curriculum)) +
stat_summary(fun.data = "mean_se", geom = "pointrange", aes(shape = Curriculum)) +
stat_summary(fun = "mean", geom = "line", aes(linetype = Curriculum)) +
theme_cowplot() +
theme(legend.position = c(.75, .18)) +
xlab("Classroom environment") + ylab("Exam score")