Report an error

6.18 Sample size

Power analyses are normally run before a study is conducted. It is most commonly used to determine the minimum sample size needed for an experiment. In Python, the sample size can be calculated using the stats.power module within the statsmodel package. This method allows the user to calculate the power analysis for common statistical tests such as T-tests, and chi square goodness of fit simply by deciding the p-value (alpha threshold), statistical power, and effect size.

Recall the following:

P-value refers to the cut-off point where test results are deemed statistically significant or not
Statistical power refers to the probability of rejecting a null hypothesis. The higher the power, the lower the probability of making a Type 2 error i.e. false negative.
Effect size refers to the quantified magnitude of a difference between groups. It helps researchers determine if a statistically significant experiment will make a practical difference in a real world setting.

The example below tells us how many participants we need to have in each group if we were to perform a 2 sample t-Test to determine a difference in the mean between two groups. Suppose our previous research tells us an effect size of 0.5 is realistic, and that we would like our test to have a statistical power of 0.8 and an alpha of 0.05. We round that answer up to get 64.

To prove that effect size, sample size, statistical power, and alpha are all related, consider the power curves below. Power curves show how the change in effect size and sample size affects statistical power. The chart shows that if we want our experiment to have a significant effect (effect size = 0.8), there is little improved benefit in recruiting more than 50 participants because that is the mark where the point of diminishing returns happens.

Please refer to the statsmodels page for a list of power and sample size calculations for other tests such as the Chi square goodness of fit test.