::p_load(tidyverse,
pacman
rstatix,
cowplot, performance)
38 Example analysis
OK. Some I know we’ve covered a lot of material and some of you have asked for a conscie walkthrough of an analysis. This walkthrough doesn’t go into as much of the conceptual details, but instead focuses on how I would approach an analysis including some of the practical things that I would do to make the analysis more concise and reproducible.
38.1 The data: Neuropsychological Performance Across Different Intervention Strategies for Dyslexia
Here’s a toy example of a dataset that I’ve created to illustrate the analysis.
38.1.1 Background:
Dyslexia is a neurological condition that affects reading, spelling, and occasionally arithmetic skills. Different intervention strategies may be applied to individuals with dyslexia to enhance their neuropsychological performance.
38.1.2 Objective:
To examine whether there is a statistically significant difference in neuropsychological performance among individuals with dyslexia who are exposed to three different intervention strategies.
38.1.3 Variables:
- Independent Variable: Intervention Strategy (three levels: Phonological, Multisensory, Computerized)
- Dependent Variable: Neuropsychological Performance Score (a continuous variable measured via a comprehensive assessment)
38.1.4 Hypotheses:
- Null Hypothesis (H0): There is no significant difference in neuropsychological performance scores across the three intervention strategies.
- Alternative Hypothesis (H1): There is a significant difference in neuropsychological performance scores across the three intervention strategies.
38.1.5 Design:
- Participants: 60 individuals diagnosed with dyslexia.
- Groups:
- Phonological Strategy Group (n=20): Individuals undergo a traditional phonological-awareness training (this is your control)
- Multisensory Strategy Group (n=20): Individuals engage in interventions involving visual, auditory, and kinesthetic (touch) stimuli.
- Computerized Strategy Group (n=20): Individuals utilize computer-based learning and practice programs.
38.1.6 Data Collection:
Each participant is assessed on their neuropsychological performance using a standardized neuropsychological assessment post-intervention.
38.2 The analysis
38.2.1 Step 1: Load in the default packages:
38.2.2 Step 2: Load in the data and take a look at it:
<- read_csv("https://tehrandav.is/courses/statistics/practice_datasets/dyslexia_interventions_groups.csv") dyslexia_df
Rows: 60 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): intervention
dbl (2): part_num, neuro_score
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(dyslexia_df)
spc_tbl_ [60 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ part_num : num [1:60] 1 2 3 4 5 6 7 8 9 10 ...
$ intervention: chr [1:60] "Phonological" "Phonological" "Phonological" "Phonological" ...
$ neuro_score : num [1:60] 66 68 77 78 69 86 67 72 82 63 ...
- attr(*, "spec")=
.. cols(
.. part_num = col_double(),
.. intervention = col_character(),
.. neuro_score = col_double()
.. )
- attr(*, "problems")=<externalptr>
38.2.3 Step 3: Perform any necessary data wrangling:
In this case the data looks like it’s in the right format (Long) and most things look okay. However the output above indicated that the intervention
variable is a character vector (read_csv
does this by default). At some point we may need to convert it to a factor so better to do it now.
<-
dyslexia_df %>%
dyslexia_df mutate(intervention = as.factor(intervention))
38.2.4 Step 4: Describe the data:
38.2.4.1 Descriptive statistics:
%>%
dyslexia_df group_by(intervention) %>%
get_summary_stats(neuro_score, type = "mean_se")
# A tibble: 3 × 5
intervention variable n mean se
<fct> <fct> <dbl> <dbl> <dbl>
1 Computerized neuro_score 20 85 1.81
2 Multisensory neuro_score 20 76.2 1.53
3 Phonological neuro_score 20 69.9 2.14
38.2.4.2 Summary plot:
ggplot(dyslexia_df, aes(x = intervention, y = neuro_score)) +
stat_summary(fun.data = "mean_se") +
theme_cowplot()
38.2.5 Step 4: Create the model:
In order to do some of our testing later on, we need to save the model that we create. We’ll do this by creating a variable called dyslexia_model
and saving the model to that variable.
<- lm(neuro_score ~ intervention, data = dyslexia_df) dyslexia_model
38.2.6 Step 5: Check the assumptions of the model:
38.2.6.1 Visualization of performance of assumptions:
::check_model(dyslexia_model, check = c("qq","homogeneity")) performance
38.2.6.2 Assumption 1: Normality
::check_normality(dyslexia_model) performance
OK: residuals appear as normally distributed (p = 0.561).
38.2.6.3 Assumption 2: Homogeneity of variance
::check_homogeneity(dyslexia_model) performance
OK: There is not clear evidence for different variances across groups (Bartlett Test, p = 0.347).
In this case, all checks out, so we can keep this model.
38.2.7 Step 6: Run the ANOVA:
Since this is a simple One-way ANOVA, I need to get general eta-squared for my effect size:
%>% rstatix::anova_test(effect.size = "ges") dyslexia_model
ANOVA Table (type II tests)
Effect DFn DFd F p p<.05 ges
1 intervention 2 57 16.891 1.74e-06 * 0.372
38.2.8 Step 7: Run the post-hoc tests:
38.2.8.1 7a: Create an estimated marginal means emmeans
model:
I recommend saving the emmeans
model to a variable called (in this case I’m naming it post_hoc_emm
) so that you can use it for different things later on.
::p_load(emmeans)
pacman<- emmeans(object = dyslexia_model, # the model
post_hoc_emm spec = ~ intervention) # the stucture of the comparisons, in this case, we're comparing the intervention variable
38.2.8.2 7b: Run a pairwise comparison:
To run a pairwise contrast, take the emmeans model and send it to pairs
and specify the adjustment method. In this case, we’re using the Tukey adjustment.
%>% pairs(adjust = "tukey") post_hoc_emm
contrast estimate SE df t.ratio p.value
Computerized - Multisensory 8.75 2.61 57 3.354 0.0040
Computerized - Phonological 15.10 2.61 57 5.788 <.0001
Multisensory - Phonological 6.35 2.61 57 2.434 0.0469
P value adjustment: tukey method for comparing a family of 3 estimates
38.2.9 Step 8: Create a camera ready plot:
In this case I’m going to do a violin plot with the means and standard errors plotted on top of it.
ggplot(dyslexia_df, aes(x = intervention, y = neuro_score)) +
geom_violin(fill = "lightgray") +
stat_summary(fun.data = "mean_se", size = 1) +
theme_cowplot() +
labs(x = "Intervention Strategy",
y = "Neuropsychological Performance Score") +
theme(axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold"))
38.3 Assuming that you are running planned contrasts
Let’s assume that I have an a priori hypothesis that any intervention is better than a phonological intervention (a digression is that perhaps I should have a no intervention group as a proper experimental control, BUT knowing that it would be improper ethically to not give an intervention to a group of people diagnoses with dyslexia, I’m going to assume that this is the best that I can do). I also want to know if a multisensory intervention is better than a computerized intervention.
To run a planned contrast I would repeat steps 1-6 above. I’m just going to stick that in a single chunk:
# Step 1: Load in the default packages:
::p_load(tidyverse,
pacman
rstatix,
cowplot,
performance,
emmeans)
# Step 2: Load in the data and take a look at it:
<- read_csv("https://tehrandav.is/courses/statistics/practice_datasets/dyslexia_interventions_groups.csv")
dyslexia_df
# Step 3: Data wrangling
# Convert intervention to a factor
<-
dyslexia_df %>%
dyslexia_df mutate(intervention = as.factor(intervention))
# Step 4: Describe the data:
## Descriptive statistics:
%>%
dyslexia_df group_by(intervention) %>%
get_summary_stats(neuro_score, type = "mean_se")
## Summary plot:
ggplot(dyslexia_df, aes(x = intervention, y = neuro_score)) +
stat_summary(fun.data = "mean_se") +
theme_cowplot()
# Step 5: Create the model:
<- lm(neuro_score ~ intervention, data = dyslexia_df)
dyslexia_model
# Step 6: Check the assumptions of the model:
## Visualization of performance of assumptions:
::check_model(dyslexia_model, check = c("qq","homogeneity"))
performance
## Assumption 1: Normality
::check_normality(dyslexia_model)
performance
## Assumption 2: Homogeneity of variance
::check_homogeneity(dyslexia_model) performance
38.3.1 Step 7: Run our planned contrast:
For this to work we will have needed to convert our intervention
variable to a factor (Step 3). If you haven’t done that, you’ll need to do that now.
38.3.1.1 Step 7a: get the ordering of the levels in the contrast:
By default, this should be alphabetical, but I like to check just in case:
levels(dyslexia_df$intervention)
[1] "Computerized" "Multisensory" "Phonological"
38.3.1.2 Step 7b: create the contrasts according to the ordering:
Remember to stick the planned contrast into a list()
variable to use later on.
<- list(
my_planned_contrasts "Any Intervention vs. Phonological" = c(-1/2, -1/2, 1),
"Multisensory vs. Computerized" = c(0, -1, 1)
)
38.3.1.3 Step 7c: create an emmeans object specifying the variable to be contrasted:
<- emmeans(object = dyslexia_model,
a_priori_emm spec = ~ intervention)
38.3.1.4 Step 7d: Perform the contrasts
Finally submit the apriori emmeans object to the contrast
function and specify the contrasts that you want to run in the method
. We also may specify the correction (bonferroni) if we feel it necessary.
In this case we’re going to run the two contrasts that we specified in Step 7b with a bonferroni correction.
%>%
a_priori_emm contrast(method = my_planned_contrasts, adjust = "bonferroni")
contrast estimate SE df t.ratio p.value
Any Intervention vs. Phonological -10.72 2.26 57 -4.747 <.0001
Multisensory vs. Computerized -6.35 2.61 57 -2.434 0.0362
P value adjustment: bonferroni method for 2 tests