Select Page

7.5 Visualizing model performance: cumulative gains chart, AUC, and ROC


Another way to assess the effectiveness of a predictive model is by plotting the sensitivity and False Positive Rate of the model using an ROC curve (Receiver Operating Characteristic), and calculating the area under the curve (AUC). 

The closer the AUC is to 1, the better the model.  To think about why this is the case, imagine a model that could identify all of the members of the positive outcome class, without making a single false positive prediction in the process.  Such a model would effectively “hug” the y-axis vertically until hitting 1.0.  

A model with an AUC of 0.5 is no better than a model making random classifications.  In theory, an AUC score could even go below 0.5, if a model’s ability to identify true positives was even worse than a random guess.  

In the classification example below, we have used a logistic regression model to predict if a customer will return to Lobster Land. The AUC for this model is 0.69, which makes it nearly 38% better than a random one.

The code above enables us to view the ROC curve and to see the AUC calculation for this model:

Another way to assess model performance is by using a cumulative gains chart, where model performance is compared with a baseline. The greater the area between the curve and the baseline, the better the model. However, this type of chart only works with two-outcome class scenarios. If we had a multi-classification model (e.g. a model to predict the price of a rental unit, with outcome classes ‘student budget accommodation’, ‘above average accommodation’, and ‘pricey-digs accommodation’), we would keep one like ‘student budget accommodation’ and collapse the remaining categories into ‘others’.

Since our dataset has twice as many actual return customers versus one-time visitors, we use balanced accuracy to assess the model.  

When evaluated against test data, our model displays a balanced accuracy score of 59.15%. The model’s performance has been compromised by its relatively poor ability to identify instances where the true outcome class is negative.

While there is room for improvement, in the gains chart below, we can assess the model in a different way.  Note that in Google Colab, to run this chart for the first time during your session, you will need to first run this line of code:   !pip install scikit-plot.  

The interpretation of the cumulative gains curve can be tricky. To do it, let’s start with the orange line above.  The values along the x-axis represent all 1280 test set records, ranked in order of their probability of renewing.  At any given point along the x-axis, the solid orange line tells us how many of the actual “1” class members have been identified by the model at that point, while the black dashed line represents the performance of a model using just a random guess.  At x = 0.4, for instance, we’re looking at the 512 records (1280*.4) from the test set that our model predicts to be most likely to belong to the 1 class.  In total, our test set contains 846 records whose actual outcome class is “1.”  Among the 512 records with the highest predict_proba() values for “1” class membership, 411 truly were members of that class.  That means that 48.58% of the “1” class members have been identified at this point – and that is the value on the y-axis when the x-axis value is 0.4

To interpret the blue line, we have to re-think the x-axis for a moment.  Now, our x-axis represents the entire test set, ranked in order of the probability of belonging to the “0” class, according to our model.  At x = 0.4, we are looking at the 512 records with the highest predicted probability of belonging to the “0” class.  264 of these records truly belong to the 0 class.  This represents approximately 60% (264/434) of the total “0” class records, which is why the y-axis value for the blue line is at 0.60 when the x-axis value is at 0.40.