Select Page

1.14 Does Domain Knowledge Matter? YES!


Always bear in mind that data modeling is part art, and part science.  We can use statistical software, and statistical knowledge, to determine things like the significance of the relationship between an input variable and an outcome variable.  However, the decision of whether to include, or exclude, a potential input variable is ultimately made by the modeler.  This decision can be informed in part by the modeler’s familiarity with the dataset, and with the associated set of problems that some business wishes to solve.  

Domain knowledge could also be useful for helping to interpret some particular value in a dataset. Imagine that you are a data research analyst on Wall Street, assigned to study toy companies operating in the United States.  In this role, one of the first things that you would learn is that such companies typically see sales spikes in November and December, due to holiday-related purchasing.  

Without domain knowledge, someone could see a big jump in fourth-quarter revenue, following dismal numbers during quarters 1 through 3 of the year, and think that it was the start of a lasting trend.  He might point to the Q4 number, quickly declare that a turnaround must have begun, and boldly state that the company’s shares are undervalued.  Those who know the industry and its patterns might be a bit slower to jump to such a conclusion!  

Domain knowledge could also be helpful for spotting errors in datasets.  Suppose that you have lived in New York City for many years.  While examining some weather data, you notice that one observation in the dataset indicates that on a January day in 2015, the high temperature in the city was 40 degrees Celsius.  Instantly, you would identify this as a mistake:  even on the most unseasonably warm January day, New York City would not even come close to that temperature.  Most likely, you might conclude, a Fahrenheit reading was mistakenly placed in the column of Celsius temperature values; regardless of the true cause of the error, it is domain knowledge that enables you to spot the problem right away.

If you are just beginning to tackle some particular problem set, do not worry about starting out without domain knowledge – it is something you will acquire with time.  Sometimes, in fact, it can be advantageous for an analyst to lack domain knowledge – that means he is more likely to approach a problem without preconceived biases.