7.1 Building a Confusion Matrix

The tool that we will use to assess classification model performance is known as a confusion matrix.

We can build a confusion matrix in Python, but for now, let’s start with a manual version that we can create without any need for software.

Let’s use a hypothetical example here to see how we can set this up.

Suppose that Lobster Land builds a model that aims to predict season pass renewal. The model examines 750 families who currently hold season passes, and classifies each one as either “renew” or “non-renew.”

Before filling in any of the data, we can set up a blank confusion matrix with nine empty boxes, as follows:

	actual renew	actual non-renew	TOTAL
predict renew
predict non-renew
TOTAL

Whereas most confusion matrices are built with a 2×2 format, we advocate the inclusion of a Total row, as well as a Total column. The values in the confusion matrix can be summed across the columns, or down the rows, to arrive at the “total” values – and knowing this can sometimes make the setup easier.

This model predicts that 600 of the families will renew their passes, and that 150 will not. From among the 600 families predicted by the model to renew, 541 actually do, whereas 59 do not. From among the 150 families predicted by the model to not renew, 112 do not renew, whereas 38 actually renew.

	actual renew	actual non-renew	TOTAL
predict renew	541	59	600
predict non-renew	38	112	150
TOTAL	579	171	750