::p_load(tidyverse, cowplot, glue) pacman
31 Power
Screams from the haters, got a nice ring to it I guess every superhero need his theme music
Power is the answer to the second question for the previous section: “How strong would a potential treatment effect in a future experiment need to be in order for us to detect it?” More precisely it is the likelihood that a study will detect an effect when there is an effect. Mathematically we can express power as \(1 - \beta\), where \(\beta\) = Type II error (Cohen, 1962, 1988, 1992). Below is a visual representation of power.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
31.1 Truth and no truth to power
In your travels you are likely to encounter two kinds of power estimates, a priori power and a posteriori power. A priori power fits with our description above, and is calculated before one conducts their experiment. A posteriori power is calculated after the experiment is completed, and asks provided the observed effect size what is the power that you achieved. One of these two is better than the other… in fact just say no to a posteriori, or post-hoc / observed power (the latter term shows up in SPSS). As a measure, it is so directly tied to you obtained p-value as to be rendered essentially meaningless, and does not really answer the question that power analysis intends to address.
31.2 Finding power
The calculation of power takes into consideration 3 factors:
- what is the size of the effect of your treatment / manipulation
- how large of a sample do you have to amplify your effect
- what is the criterion by which you will measure if the effect is detected.
To squeeze every last drop out of the radio metaphor we can think of this as:
- what is the characteristic of the original signal relative to noise
- how much of a boost can you give that signal (how many antennae is it emitted from)
- how sensitive is the receiving antenna that tunes in the signal and tunes out the noise.
In statistical terms these are:
- your effect size
- your sample size
- your \(\alpha\) (alpha) value (e.g, \(p<.05\))
As experimenters, we can manipulate any of these 3 components to increase the power of our design, though due to practical or theoretical constraints tend to focus on sample size and alpha value. Assuming that we allow effect size to remain constant (that is we don’t try to enhance or exaggerate the effect) we can calculate power as such:
Calculating a priori Power Involves setting a specific likelihood (power = .80) at given \(\alpha\) level (.05), given a known effect size. This is typically used to estimate the sample size needed to achieve that power (i.e., how much of a signal boost do we need). A key problem here is determining the proper effect size? One suggestion is to survey the literature for independent experiments and meta-analyses and use their results as a guide. BUT, one important thing to consider is that due to publication bias, those results are likely to be overestimates (only effects that are significant are reported, see Kicinski et al., 2015). Some other rules of thumb, assume your effect is small-to-medium (d = .3 or .4).
Post-hoc Power is estimated given the observed effect size you found from your study, the sample size and \(\alpha\) you set. As I mentioned post-hoc power doesn’t really have much value. See here for a more detailed discussion.
31.3 Estimating a priori Power
Here we aim to ensure a specific likelihood that we will find a significant effect (\(1-\beta\)). There are two likelihood values that are often chosen for power: .8 and .95. 80% power is the value suggested by Cohen because it’s economical. 95% is called high power, and it has been favored of late (but its very expensive and you will see why). To calculate power, we need 3 pieces of information. 1) Effect size (in Cohen’s \(d\)), 2) your \(\alpha\) level, and 3) a number of tails you want to set (usually 2-tailed).
To calculate power in R
we can use the pwr
package. It should already be installed on your computer (it comes with R
), but if for some reason it is not, you can re-install it. In our explanation of power thus fair we have discussed it in the context of comparing two means, as we would in a t-test. Let’s start by performing a power analysis using this example.
For example let’s assume that we are attempting to achieve a power of .80
given an effect size (d) of .4
and an \(\alpha\) of .05
when comparing two means in a independent samples t-test. Here we would use the pwr.t.test
function in the pwr
package.
pwr.t.test
works by solving what you leave as NULL
.
library(pwr)
pwr.t.test(n = NULL, d = .4, power = .80, sig.level = 0.05,
type = c("two.sample"), alternative = c("two.sided"))
Two-sample t test power calculation
n = 99.08032
d = 0.4
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
Here it is telling me that in order to achieve my desired power I would need to recruit 99 participants for each group. Conversely, if this was a paired samples t-test:
pwr.t.test(n = NULL, d = .4, power = .80, sig.level = 0.05,
type = c("paired"), alternative = c("two.sided"))
Paired t test power calculation
n = 51.00945
d = 0.4
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number of *pairs*
I would need to recruit 51 participants (again showing you repeated measures designs are more powerful).
What happens if I bump my desired power up to .95?
pwr.t.test(n = NULL, d = .4, power = .95, sig.level = 0.05,
type = c("paired"), alternative = c("two.sided"))
Paired t test power calculation
n = 83.16425
d = 0.4
sig.level = 0.05
power = 0.95
alternative = two.sided
NOTE: n is number of *pairs*
pwr.t.test(n = NULL, d = .4, power = .95, sig.level = 0.05,
type = c("two.sample"), alternative = c("two.sided"))
Two-sample t test power calculation
n = 163.4006
d = 0.4
sig.level = 0.05
power = 0.95
alternative = two.sided
NOTE: n is number in *each* group
In this case I need to add about 30 more participants in the paired-samples case and about 64 participants to EACH group in the independent samples case.
31.4 Sample size and Power (diminishing returns)
Sample size and power at a given effect size have a non-linear relationship, such that eventually you hit a wall in how much of an increase in power you get with each increase in sample size. Keeping in mind that pwr.t.test
works by solving what you leave as NULL
, we can calculate power given our estimated effect size, our alpha, and our likely sample size. Say for example we have a treatment with an estimated effect of \(d=.40\). Given our budget, we are limited to collecting no more than 50 participants total (25 in each group). We can then ask ourselves how powerful is our design (what is the likelihood of detecting an effect given it is there).
pwr.t.test(n = 50, d = .4, power = NULL, sig.level = 0.05,
type = c("two.sample"), alternative = c("two.sided"))
Two-sample t test power calculation
n = 50
d = 0.4
sig.level = 0.05
power = 0.5081857
alternative = two.sided
NOTE: n is number in *each* group
Here our power is .51, indicating that we have a 51% chance of detecting an existing effect. This study may not be worth the time, money, and effort. What if we could run this as a paired test:
pwr.t.test(n = 50, d = .4, power = NULL, sig.level = 0.05,
type = c("paired"), alternative = c("two.sided"))
Paired t test power calculation
n = 50
d = 0.4
sig.level = 0.05
power = 0.7917872
alternative = two.sided
NOTE: n is number of *pairs*
Now our power approaches 80%. This might be a good study to run.
Lets take a look at some power curves, with a priori power as a function of sample size:
As you can see, the larger the effect the fewer subjects you need. At the same time both curve plateau suggesting that about a certain sample size your gains in power are inconsequential.
31.5 Power and effect size
In a similar vein we address this question: what is the smallest effect size that we can hope to detect given our design. Lets assume we can collect no more than 25 participants and desire a power level of .80.
Again using pwr.t.test
, setting our desired power at .80 and leaving d
NULL:
::p_load(pwr)
pacmanpwr.t.test(n = 100, d = NULL, power = .8, sig.level = 0.05,
type = c("paired"), alternative = c("two.sided"))
Paired t test power calculation
n = 100
d = 0.2829005
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number of *pairs*
Given these criteria, I should be able to detect an effect as small as \(d\) = 0.28.
31.6 Regression example
We can also estimate power of simple regression analyses using this package and the pwr.f2.test
. In this case we need to know our
- desired
power
- numerator degrees of freedom,
u
- denominator degrees of freedom,
v
- the effect size, \(R^2\),
f2
- and our significance level
As above, whatever is left NULL
is what’s calculated. For example, how many participants would we need to detect an R^2
of .39, at a desired power of .80 assuming a simple regression (where our numerator df = 1)
pwr.f2.test(u = 1,v = NULL,f2 = .39,power = .80)
Multiple regression power calculation
u = 1
v = 20.22032
f2 = 0.39
sig.level = 0.05
power = 0.8
This output tells us that our denominator df would need to be about 20. Knowing the relationship between \(N\) and the denominator df in simple regression suggests I would need 22 participants.
You can check out this link for more detail about the power package