1.26 Models: What They Are & Why They Matter
A model is a representation of reality, built from existing observations of some phenomenon that we wish to better understand. A model can be explanatory (how do some inputs lead to a particular outcome?) or it can be predictive (given some set of inputs, what will happen in the future?)
When working with models, we should always acknowledge that they serve an important, but limited, purpose. We should start by acknowledging that, outside of strictly controlled laboratory experiments, we never expect a model to achieve perfect performance.
Especially in marketing analytics, where we often attempt to model outcomes that involve human decisions, we are subject to all the quirks and unpredictability of consumers.
Imagine, for example, trying to model how much a family will spend during a weeklong trip to Lobster Land. Most likely, we could identify several inputs that would be relevant to this outcome (size of the family, income, age of the children and parents, how far they traveled for the vacation, whether they had visited the park before, etc.) Perhaps such a model could provide a reasonably good forecast.
However, such a model could never be perfect – there are too many unknowns to capture. What if, for instance, one of the parents thinks that a work promotion is coming soon – will that lead to higher spending on vacation? Conversely, what if the parents are beginning to worry that the children are being given too much? Or not enough? What if some families never venture out of their hotel room on rainy days, whereas others will rack up huge bills on merchandise shopping and arcade games? We just don’t know.
A common mistake made by students new to data modeling is to assume that a model’s performance can continually improve, until it reaches perfection, if increasingly sophisticated modeling techniques are used. Sometimes, the opposite is in fact true – all else equal, the simplest modeling technique that can perform the job required, and the inclusion of the fewest variables truly needed, is the best way to go.