Select Page

8.7 Concentrating our marketing efforts


Many times, when working with classification models, we tend to focus on the binary outcomes – is a record predicted to be a 1, or a 0?  This is sensible; after all, classification is, literally, about assigning observations to the correct groups, or outcome classes.  

However, as we saw earlier in this section, a logistic regression model makes its decision based on an underlying probability.  By default, it uses a cutoff of 0.50 – if a record is 50% or more likely to belong to the 1 class, then its predicted outcome class is 1; if not, its predicted outcome class is 0.  

To revisit a term that we first saw in Chapter 1, this is an extreme form of binning – it reduces all of our data to just one of two outcome groups.  That simplification is helpful, but it does not come without a cost.  A record that is, say, 52% likely to belong to the “1” class is probably much more similar to one that is 46% likely to belong to that class, compared with a record that is 98.5% likely to belong to the “1” class.  

In many marketing scenarios, this distinction can matter tremendously.  Think about a mobile phone company with a model that predicts customer churn.  While the categorical outcomes could be helpful, the underlying probabilities are likely to be far more relevant.  Perhaps they can assume that customers with churn probabilities above 70% are a lost cause, and that those whose churn probabilities are below 30% to be safe.  By focusing their outreach energies on the group in the middle, they may be able to zero in on the people for whom the right type of “nudge” could be the key to customer retention.  

The ‘predict_proba’ function within Python’s scikit-learn allows us to see the probabilities associated with each prediction.  When we look at predict_proba() results, we will see two columns associated with each record.  The first column reflects the probability of not renewing the season pass while the second column shows the probability of renewing – note that in all cases, these values add to 1.

Much like the phone company in the hypothetical example above, Lobster Land could use the predict_proba() values to identify different types of customers.  For instance, here are the 10 records from the entire test set that are most likely to renew their passes.

Among this group, nine households actually did renew, as shown below:

Drilling down to specific slices of the records can sometimes be hugely valuable, depending on the context and purpose of a particular model.  For instance, what if we were selling something that involved a costly and lengthy sales pitch, but a very high payoff for conversions?   In such a case, we might not care so much about a model’s overall accuracy.  If the model could perform really well against the slice of records predicted to be most likely to say yes, it could be the equivalent of data mining gold for a company – even if its overall accuracy rate was lower than the naive benchmark.

Even in this case, if we compare our overall model accuracy rate to the naive rate, it might seem unimpressive – our accuracy rate is between 69 and 70 percent, while an approach that just labeled every household as ‘renew’ would have achieved an accuracy of about 66 percent, as demonstrated below.

Still, our model surpasses that benchmark and, as noted above, achieves a very impressive sensitivity. Furthermore, perhaps Lobster Land would like to use this model to identify the “on-the-fence” group of households who are uncertain about renewal.

The code below isolates the subset of households whose predict_proba() values fall between 0.40 and 0.60.  This group includes approximately 25 percent of the test set – and could perhaps be the group of households most in need of some particular “nudge” from Lobster Land’s outreach team.  As the random sample of 15 households shown here indicates, these look like families that either may or may not renew for the next season.  Identifying this specific group could prove to be one of the most beneficial outcomes of a model such as this one.

Yet another way we can use this model is to generate predictions for new records.  Suppose that after we build this model, Lobster Land asks us to predict whether we think the Sebastian family will renew its season pass.  We can generate such a prediction.