Select Page

8.6 Assessing the model’s performance


As you saw in Chapter 7, attaining a high accuracy value will not always be the “be-all, end-all” goal of a classification model.  However, accuracy is an important metric in most instances.  Here, we will use it to check for overfitting.  

Using the LogisticRegression class from scikit-learn, we fit a model using the x2_train inputs.  Note that we only use the training data for model building.

Next, we generate a set of predictions, predictions1, using the logmodel object, with the x2_train
values as our inputs. Comparing the actual outcomes from y_train against those predictions, we
see that the model attained an accuracy score of 70.3125%.  

With that baseline in mind, we then generate a new set of predictions, predictions2, using the
inputs from x2_test.  These records were not seen by the model during the fitting process.  We compare predictions 2 to the true outcome values for the test set to generate an accuracy score of 69.21875%.  Since the accuracy percentages are very similar, it appears that our model has not been overfit to the training set.

A confusion matrix is always used to assess the performance of a classification model, as it compares the model’s predicted results against the actual results. Apart from revealing a model’s overall accuracy levels, the matrix also provides valuable answers to questions such as:

“How good is the model at identifying the families that actually do renew their season pass?” 

“When the model predicts that a family will renew its pass, how often is the model right?”

The screenshot below offers additional insights about the model performance against the test set.

From the confusion matrix above, we can see that there were 765 times in which the model predicted that a family would renew, and the family did so.  In addition, there were 121 times in which the model correctly predicted a family would not renew.  We can take those 886 times, and divide by the total size of the training set (1280) to get the 69.21875% overall accuracy.  

That process is also demonstrated here, with Python code:

We can gain other insights about the model with a classification report:

The recall of 90% for the “1” class here shows us that among all the people who truly did renew, our model identified 9 in 10 as predicted renewers.  From the confusion matrix, we can see that there were 765 + 81 = 846 actual renewers in the test set; of those 846 people, the model correctly labeled 765 of them.  

You might wonder “Okay, so what?”  Why  should we care about 90 percent sensitivity?  When the cost of a False Positive is low, but the gain from a True Positive is high, we will be especially interested in a good sensitivity score.  Suppose, for instance, that in December of each year, Lobster Land sends a free calendar to the home address of every household that it predicts will renew their season pass.  If we suppose that the cost to produce, print, and ship these calendars is much lower than the benefit that Lobster Land accrues from each renewing household – and that the calendars are a positive influence on households’ renewal decisions – then we should want a model with a high sensitivity value.   

Here is a way to generate the sensitivity score from the confusion matrix, using Python code:

When thinking about the formula below, remember that “True Positive” is synonymous with “True Predicted Positive.”  These are records that the model predicts will belong to the Positive class, and that actually do.  “False Negative” is synonymous with “False Predicted Negative” – these records are predicted to belong to the Negative class, but belong to the Positive class in reality.