Select Page

1.12 Grouping Data To Take Description to a Deeper Level


Using the describe() function to help put context around some particular variable, as shown in the previous section, is a good start.  However, we can often learn even more by taking a step deeper into the data.  

Intuitively, we might expect Lobster Land’s revenue to differ by day of the week.  Using the groupby() function from pandas, we can in fact see that Lobster Land’s daily gross revenue peaks on the weekends, with the highest average occurring on Fridays. Mondays do relatively well, compared to other weekdays (perhaps long weekends and holidays help to influence this).  

Digging into the data with this groupby() operation helps us even more to put results into context. Now, we know that the $165,000 figure mentioned in the previous section is still impressive, regardless of the day of the week, but would be especially impressive if it occurred on a Monday, Tuesday, Wednesday, or Thursday.  

When we perform a grouping such as the one shown above, each level of the grouped variable appears ordered alphabetically by default.  Seeing as the weekdays’ alphabetical positioning is unrelated to their actual order of occurrence within a week, this default presentation could be confusing for an audience.  With pandas’ cat.reorder_categories() function, we can reorder the levels of any categorical variable, as shown below.  

The resulting look at the average revenue by day of week is much easier to understand and interpret.

To see an even more granular summary stats about the day-by-day revenue comparison, we could use describe() rather than mean(), as shown below:

With more detail about the day-to-day stats, we can now see that Monday stands out for its unusually high standard deviation.  Standard deviation will be covered in more depth later in this chapter, but in short, it is a measure of the expected variation among the values of some variable.  An important piece of contextual detail here is that Lobster Land’s annual season starts and ends on two major holidays – Memorial Day and Labor Day – each of which falls on a Monday.