Report an error

14.1 Ride Popularity at Lobster Land

The dataset ride_ratings can be found at lobsterland.net/datasets. This dataset comes from a group of 45 park enthusiasts, who each volunteered to go on up to 12 rides, giving each one an overall quality rating from 1 (did not like it, and/or would not ride again) up to 6 (loved the ride and would gladly queue up for it again in the future). There are a few NaNs scattered throughout this data – these indicate that the reviewer did not go on that attraction during the day that he or she was scoring the rides.

After first importing pandas, we bring this dataset into our environment and take a peek at its first five rows. You may notice here that some of our columns’ values appear as integers, whereas others are floats. The columns that do not contain any NaN values appear here as integers. For any calculations we perform, the distinction will not matter.

Before getting into any analysis, we’ll move rider_id to the index position. By setting it as the index, all of our columns will then be based on the ride ratings.

To assess the rides’ comparative popularity, we could start by simply viewing the average scores per ride.

This sorting shows us that the Ferris Bueller and the Lobster Claw stand tall above the rest. There’s a pretty densely-packed group in the middle, and we can also see that the Pirate Ship and Twisty Slide bring up the rear here.

Does this mean we should recommend the Ferris Bueller and the Lobster Claw to all visitors? Not necessarily! We may want to dig a bit deeper into the data first.

Besides looking at a centrality measure like a mean or median, we may also want to consider a dispersion measure, such as standard deviation.

Ranking the rides by standard deviation reveals some new insights for us – some rides tend to elicit much stronger opinions than others. Our two highest-rated rides, based on overall mean, are the Lobster Claw and the Ferris Bueller. Yet one of these rides has the third-lowest standard deviation among the group, while the other has the highest standard deviation. This tells us something interesting, and potentially important when thinking about recommendations – the Lobster Claw seems to elicit much stronger feelings overall (positive and negative), whereas the Ferris Bueller’s ratings are much more steady and consistent.

The plots below offer us some insight into these differences. The bar plots below show the counts for each of the six rating options for the Lobster Claw (left) and the Ferris Bueller (right). The Lobster Claw received more ratings at the 1 and 6 extremes, whereas the Ferris Bueller received more ratings in the middle of the spectrum.

Another way we can glean descriptive insights about the rides is by looking at the per-ride totals for NaNs. An interpretation challenge here is that from the dataset alone, we only know that an NaN tells us the user did not go on the ride – but we cannot be completely sure as to why this happened. Perhaps the reviewer was deterred by a long line, and missed the opportunity to take the ride before the park closed on the day that the reviews were submitted. It’s also possible that some simply backed away from certain types of rides – all three of the attractions here with multiple NaN values feature rapid drops from tall heights, which may scare away some riders.