Select Page

1.15 Summary Stats: Mean, Median, Mode, and Range


Let’s take a look at 15 consecutive days’ worth of Lobster Land 50th anniversary commemorative t-shirts, sold at the merchandise store at the park.

Day #123456789101112131415
Commemorative Shirt Sales561822417155496158525361197056

The arithmetic mean of a collection of n numeric values can be found by summing those values, and then dividing that sum by n.

In this case, we have 15 elements, and the cumulative sum of those elements is: 56 + 18 + 22 + 41 + 71 + 55 + 49 + 61 + 58 + 52 + 53 + 61 + 19 + 70 + 56 = 742.  We can take that cumulative sum and divide it by 15 to arrive at a mean value of 49.47.  

Summing each of our elements by hand, though, is not only tedious, but also error-prone.  Thankfully, numpy can help us out with this!

After instantiating the shirt_sales numpy array, shown above, we can call np.mean() on that object to find the average number of shirts sold per day.

A median can be thought of as a “middle” value.  To find a median, first sort the elements in order.  The median can also be described as the 50th percentile of the data.  If there are an odd number of elements, the median is the (n+1) / 2 element.  If there are an even number of elements, the median is found by taking the arithmetic mean of the (n/2)th and the ((n/2)+1)th elements.  With a set of 5 ordered elements, the median would be the 3rd one; with a set of 6 ordered elements, the median would be the mean of elements 3 and 4.  

A median is often more appropriate than a mean when the data contains extreme outliers.  Imagine a street called Sturbridge Drive, located in a middle-class neighborhood with 25 homes.  24 of these homes are very similar to one another, each valued at approximately USD $250,000.  At the end of Sturbridge Drive, however, is a mansion valued at approximately USD $18,000,000.

Above: Stock image of a mansion7

If someone asks us, “What is the average price of a home on this street?” we would not be lying if we responded with the mean value, USD $960,000.  However, that figure is misleading.  Someone looking to purchase a house on Sturbridge Drive might be better served by learning that the median value is USD $250,000.  

The inter-quartile range, or IQR, is the difference between the 75th and 25th percentiles aka the middle 50% of a dataset.  The 75th percentile is the value for which 75% of the other observations fall below it; likewise, the 25th percentile is the point at which 25% of the data falls below.  Half of the observations fall within the IQR.  Like the median, the IQR is “robust to outliers.”

The mode of a set of values is the most frequently occurring value from the group.  In the t-shirt example above, 56 and 61 each occur twice, so both could be considered the mode.  In the house price example above, the mode value is USD $250,000.  

The range of a set of values is found by subtracting the lowest value from the highest value.  For the commemorative shirt sales values shown above, the range is 53, which we can obtain by subtracting the lowest total (18) from the highest (71).


7 https://www.pexels.com/photo/outdoor-fountain-242246/