Select Page

4.6 Probability sampling methods: Stratified Random Sample


In a stratified random sample, a researcher is still using randomness, but in a way that ensures even representation across some specifically designated groups.  This efficient method ensures diversity across the sample, and usually leads to similar variance within each group. Why do we care about variance? Because variance impacts standard error, which in turn affects the margin of error and confidence interval i.e. the precision of your estimates.2

To revisit the college scenario mentioned in the previous paragraph, our simple random sample would not guarantee equal reach across the four classes of students on campus.  Using purely random selection, we might, for instance, arrive at a sample with 45 freshmen, 15 sophomores, 30 juniors, and 10 seniors.  Perhaps such an imbalance would not pose a problem, but if we wanted to be sure to obtain balance among the classes, we could use a stratified sample instead.

With a stratified sample, we would first break the students down into four different pools to sample from (and for the sake of this example, let’s assume that there are exactly 2500 freshmen, 2500 sophomores, 2500 juniors, and 2500 seniors on this campus).  By randomly sampling one percent of the records from each of the four classes separately, we will be guaranteed to have a sample with even representation across the classes.

In Python, stratified random sampling can be done using Python’s sample() function from the random module. The code below shows how two students from each class can be chosen at random using a combination of Python’s groupby() and sample() function.

What if we only wanted to sample two-thirds of all students while retaining the proportions? In the example below, we have 10 students. 60% of students in our dataset are from Class A, and 40% are from Class B. By including the ‘frac’ argument within the sample() function, not only do we get the sample size we want, we also obtain proportions that are similar to those of the total population.


2 https://stattrek.com/survey-research/stratified-sampling-analysis