41 Simple Example Analysis

At this point I’d like to pause to run through a simple example analysis step by step based upon some data we worked with in class. As with the last one of these, think of this as a tldr; version of the previous two walkthroughs.

41.1 Evaluating Exam Performance in Different Learning Environments

In the wake of evolving educational systems, a school district is keen to optimize learning experiences and academic performance. They have integrated both traditional classroom settings and online platforms for educational delivery. Additionally, they offer a range of curricula: Standard, Intermediate, and Advanced, to cater to students with varying academic abilities.

The educational board hypothesizes that classroom environment and curriculum type may have distinct impacts on students’ exam performance. To investigate this, they decide to conduct a study involving high school students from multiple schools within the district.

41.1.0.1 Objectives:

To determine if a traditional classroom environment is more conducive to learning compared to an online setting, taking into account varying curricula.
To assess if the level of curriculum—Standard, Intermediate, or Advanced—significantly affects exam scores, and whether this effect differs between traditional and online classrooms.

41.1.0.2 Participants:

Approximately 300 high school students participate in this study. Students are equally distributed among traditional and online settings and among the three curriculum types.

41.1.0.3 Methodology:

Students are randomly assigned to one of the two classroom environments (Traditional or Online).
Within each classroom environment, students are further assigned to one of the three types of curriculum (Standard, Intermediate, Advanced).
At the end of the semester, students’ exam scores are recorded. The exam is standardized and is out of 100 points.

41.1.0.4 Data Collection:

Classroom Environment: Traditional or Online
Curriculum Type: Standard, Intermediate, Advanced
Exam Score: Ranging from 0 to 100

41.1.0.5 Data Analysis Plan:

Conduct a factorial ANOVA to investigate the main effects of classroom environment and curriculum type, as well as their interaction, on exam scores.
If significant interactions are found, conduct post-hoc tests to understand the nature of these interactions.

41.1.0.6 Anticipated Outcomes:

The educational board is particularly interested in identifying whether certain combinations of classroom environment and curriculum type yield significantly higher exam scores, as this information could guide future educational strategies within the district.

Summary of design

Independent Variables:

Classroom Environment (Traditional, Online)
Curriculum Type (Standard, Intermediate, Advanced)

Dependent Variable:

Exam Score (0-100)

41.2 Analysis

41.2.1 Dependencies

Packages:

pacman::p_load(rstatix,
               cowplot, 
               tidyverse, 
               emmeans,
               performance)

Data

learning_df <- read_csv("http://tehrandav.is/courses/statistics/practice_datasets/learning_environments.csv")

Rows: 300 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): ClassEnv, Curriculum
dbl (1): ExamScore

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

str(learning_df)

spc_tbl_ [300 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ClassEnv  : chr [1:300] "Traditional" "Traditional" "Traditional" "Traditional" ...
 $ Curriculum: chr [1:300] "Standard" "Standard" "Standard" "Standard" ...
 $ ExamScore : num [1:300] 57 76 78 56 63 67 77 73 80 76 ...
 - attr(*, "spec")=
  .. cols(
  ..   ClassEnv = col_character(),
  ..   Curriculum = col_character(),
  ..   ExamScore = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

learning_df %>% head()

# A tibble: 6 × 3
  ClassEnv    Curriculum ExamScore
  <chr>       <chr>          <dbl>
1 Traditional Standard          57
2 Traditional Standard          76
3 Traditional Standard          78
4 Traditional Standard          56
5 Traditional Standard          63
6 Traditional Standard          67

41.2.2 Descriptive Stats

Summary Table:

# table of cell means
learning_df %>%
  group_by(ClassEnv,Curriculum) %>%
  rstatix::get_summary_stats(type = "mean_se")

# A tibble: 6 × 6
  ClassEnv    Curriculum   variable      n  mean    se
  <chr>       <chr>        <fct>     <dbl> <dbl> <dbl>
1 Online      Advanced     ExamScore    50  85.5  1.38
2 Online      Intermediate ExamScore    50  71.6  1.35
3 Online      Standard     ExamScore    50  63.9  1.41
4 Traditional Advanced     ExamScore    50  78.7  1.35
5 Traditional Intermediate ExamScore    50  75.9  1.35
6 Traditional Standard     ExamScore    50  71.5  1.49

Figure

ggplot(learning_df, aes(x = ClassEnv, y = ExamScore, group = Curriculum)) +
  stat_summary(fun.data = "mean_se", geom = "pointrange", aes(shape = Curriculum)) +
  stat_summary(fun = "mean", geom = "line", aes(linetype = Curriculum)) +
  theme_cowplot()

41.2.3 Build the model

In the past I’ve recommended lm to reinforce that this is just another linear model. Moving forward I’m going to emphasize using aov which as I mentioned before is just a fancy wrapper for lm. Using aov provides the most consistent and efficient path for running ANOVA using the frameworks we’ve employed to this point.

learning_model <- aov(ExamScore ~ ClassEnv + Curriculum, data = learning_df)

41.2.4 Test model assumptions

performance::check_normality(learning_model)

OK: residuals appear as normally distributed (p = 0.523).

performance::check_homogeneity(learning_model)

OK: There is not clear evidence for different variances across groups (Bartlett Test, p = 0.981).

41.2.5 Perform ANOVA

anova_test(learning_model, effect.size = "pes")

ANOVA Table (type II tests)

      Effect DFn DFd      F        p p<.05   pes
1   ClassEnv   1 296  2.104 1.48e-01       0.007
2 Curriculum   2 296 49.190 3.60e-19     * 0.249

41.2.5.1 No Main effect for `ClassEnv`

nothing to see here, moving on.

41.2.5.2 Main effect for Curriculum

has three levels. Need to run a post hoc test

emm_mdl <- emmeans(learning_model, specs = ~Curriculum)
emm_mdl %>% pairs()

 contrast                estimate   SE  df t.ratio p.value
 Advanced - Intermediate     8.34 1.45 296   5.743  <.0001
 Advanced - Standard        14.34 1.45 296   9.875  <.0001
 Intermediate - Standard     6.00 1.45 296   4.132  0.0001

Results are averaged over the levels of: ClassEnv 
P value adjustment: tukey method for comparing a family of 3 estimates

means table for curriculum

learning_df %>%
  group_by(Curriculum) %>%
  get_summary_stats(type = "mean_se")

# A tibble: 3 × 5
  Curriculum   variable      n  mean    se
  <chr>        <fct>     <dbl> <dbl> <dbl>
1 Advanced     ExamScore   100  82.1 1.02 
2 Intermediate ExamScore   100  73.7 0.974
3 Standard     ExamScore   100  67.7 1.09

41.3 Conclusions / Write-up

We hypothesized students exam performance would be influenced by both the classroom environment and the type of curriculum. To test these hypotheses we submitted exam scores to a 2 (Environment) $\times$ 3 (Curriculum) ANOVA. The ANOVA revealed a significant main effect for Curriculum, $F$ (2, 296) = 49.19, $p$ < .001, $\eta_p^2$ = .25. As seen in Figure 1 (below), Tukey post-hoc comparisions revealed significant differences between the three curriculum types ($ps$ < .05). Students’ exam scores were higher for those that took the Advanced curriculum ($M±SE$: 82.1 ± 1.0) compared to the Intermediate (73.7 ± 1.0) and Standard (67.7 ± 1.1) conditions.

ggplot(learning_df, aes(x = ClassEnv, y = ExamScore, group = Curriculum)) +
  stat_summary(fun.data = "mean_se", geom = "pointrange", aes(shape = Curriculum)) +
  stat_summary(fun = "mean", geom = "line", aes(linetype = Curriculum)) +
  theme_cowplot() +
  theme(legend.position = c(.75, .18)) + 
  xlab("Classroom environment") + ylab("Exam score")