Select Page

1.28 Supervised Learning vs. Unsupervised Learning


With supervised learning, the modeler starts with data that includes a target, or response, variable, along with one or more potential explanatory variables.  The algorithm used to build the model with such data identifies the patterns between the inputs and the outcome.  

Customer churn is a common supervised learning problem in marketing analytics.  A company studying this problem seeks to identify the common factors among the customers who do not renew their service subscriptions.  

No algorithm anywhere in the world is sophisticated enough to predict customer churn probability on its own, from scratch.  However, there are several effective classification algorithms that, when presented with large numbers of records whose outcome values are known (did they churn, or did they renew?), can identify the important predictors, and deliver its findings in the form of model coefficients.  

Other supervised learning models are built in a similar way.   A labeled dataset already contains the answers to whether a customer renewed her season pass, whether a family redeemed a coupon offer, or how much someone spent on arcade games in the Gold Zone in a single summer.  The modeler uses the results obtained from statistical software, along with some of his own judgment, to determine the model’s final format.  

With unsupervised learning, on the other hand, the modeler does not begin with a target variable in mind.  There is no “answer” for the model to learn how to find; instead, the model often identifies distinct groups of records within the data, or other interesting patterns that can be used by a modeler.  

When assessing unsupervised learning models, therefore, there is no “accuracy” metric to speak of – the important question to ask is simply whether the model serves the business needs for which it was built.  Perhaps the most important unsupervised learning technique in marketing analytics is clustering, which will be the subject of Chapter 3.