9.6 Assessing a Random Forest Model
By comparing the model’s accuracy against the training set with its accuracy against the test set, we can get a sense of whether the model has been overfit.

Against the training set, the random forest model has attained an impressive 82.29% accuracy. The model’s accuracy against the test data is a tad lower, at 74.06%, but we can still say that these scores are in the same “ballpark.” While this is ultimately a judgment call rather than a rigorous statistical test, we can say here that there does not appear to be a high risk of overfitting.
When comparing the random forest model’s accuracy against the test set to the single tree’s accuracy against the test set, we can see a 4.29% improvement.

At first glance, that may not sound like much. Someone brand new to data modeling would certainly be forgiven for asking at this point whether it was even “worth it” to go through all that trouble of forming a new model, and iterating through the hyperparameter options, to gain this 4.29% jump. So was it worth it? In technical terms, the short answer here is “heck yes.”
Even just for the records in this dataset, the ability to accurately predict a handful more of them, accurately, could yield major benefits for Lobster Land – certainly, enough to justify a few extra minutes of modeling work. The major payoff from the extra work, though, would come from scale – as this (or any other model) is used over and over again, with new data, it can continue to deliver rewards.
We can also assess the model by building a classification report, as shown below.

The concepts of precision, recall, and F1 score are covered in Chapter 7, so we will not rehash them here. In the top part of this report, the row corresponding to the “1” class show these values for our model’s positive outcome class, renew. In the 0 row, we can see the equivalent values for the other outcome class. The 0 class precision is sometimes called negative predictive value (of the times the model predicts “0” how often is it correct?) The 0 class recall is equivalent to specificity. The 0 class F1 score is the harmonic mean between negative predictive value and specificity.
The support column here just shows totals. This test set had 434 actual “0” class outcomes, and 846 actual “1” class outcomes.
Accuracy earns its own line in the bottom section. We can see here that overall model accuracy was 74%.
The line with “macro average” gives us the arithmetic means of the precision, recall, and F1 scores for the two outcome classes.
The final line, with “weighted average” provides us with a weighted mean of precision, recall, and F1. Since 66% of the records in the test set are actual “1” outcomes, and 34% are actual “0” outcomes, the weighted averages are obtained by multiplying the 1 class metric by 0.66, multiplying the 0 class metric by 0.34, and summing those values. For instance, we get 0.74 for the recall weighted average from:
