Select Page

7.1 Building a Confusion Matrix


The tool that we will use to assess classification model performance is known as a confusion matrix.  

We can build a confusion matrix in Python, but for now, let’s start with a manual version that we can create without any need for software.

Let’s use a hypothetical example here to see how we can set this up. 

Suppose that Lobster Land builds a model that aims to predict season pass renewal.  The model examines 750 families who currently hold season passes, and classifies each one as either “renew” or “non-renew.”  

Before filling in any of the data, we can set up a blank confusion matrix with nine empty boxes, as follows:

actual renewactual non-renewTOTAL
predict renew
predict non-renew
TOTAL

Whereas most confusion matrices are built with a 2×2 format, we advocate the inclusion of a Total row, as well as a Total column.  The values in the confusion matrix can be summed across the columns, or down the rows, to arrive at the “total” values – and knowing this can sometimes make the setup easier.  

This model predicts that 600 of the families will renew their passes, and that 150 will not.  From among the 600 families predicted by the model to renew, 541 actually do, whereas 59 do not.  From among the 150 families predicted by the model to not renew, 112 do not renew, whereas 38 actually renew.  

actual renewactual non-renewTOTAL
predict renew54159600
predict non-renew38112150
TOTAL579171750