6.15 A/B Testing Pitfalls
To run a successful A/B test, it is essential that the users are randomly assigned to either group. If the split is created in any other way, the results of the test cannot be considered valid. Splitting your consumers based on the chronological order of their sign-up date, their total spending, or a demographic feature such as their age means that a confounding variable has now entered the experiment. Since A/B testing is designed to identify cause-and-effect relationships, a confounding variable would throw the results of such a test into question.
Furthermore, valid interpretation of A/B test results depends on the key assumption that the only changed variable is the one that the experimenter wishes to explore. This ensures that there is not some other confounding variable that might be influencing the results. Many people run A/B tests sequentially, perhaps changing the button color first (blue or green), followed by font type (Comic Sans or Georgia) …etc. The downside about sequential A/B testing is that it does not consider relatively more complex scenarios where factors interact. Consider the following scenario:
A/B test 1: Variable that is changed: button color
Winner: Green
| Button color | Font type | ‘Call to action’ message on button |
| Blue | Georgia | Subscribe now |
| Green | Georgia | Subscribe now |
A/B test 2: Variable that is changed: Font type
Winner: Comic sans
| Button color | Font type | ‘Call to action’ message on button |
| Green | Comic Sans | Subscribe now |
| Green | Georgia | Subscribe now |
A/B test 3: Variable that is changed: Call-to-action message
Winner: Hit me up, baby!
| Button color | Font type | ‘Call-to-action’ message on button |
| Green | Comic sans | Hit me up, baby! |
| Green | Comic sans | Subscribe now |
In the above scenario, it may be that users prefer green on average but would rather have blue when it is paired with comic sans. The problem is that this combination of ‘blue+comic sans’ was not offered to users in any of the A/B tests listed above because the color green won the first test.
That is why some marketers prefer multivariate testing4, which is separate from A/B testing, because marketers perform experiments in which multiple variables are altered simultaneously. A multivariate test of a website could involve a change in the navigation menu layout and a change in color scheme. With three different menu layouts and four different color schemes, there would be 12 total unique combinations to compare. To the best degree possible, an A/B test should be conducted when there are as few outside variables as possible that could influence someone’s actions. An A/B test of a feature on a job search site could be hard to interpret, as there are so many factors that could impact whether a job seeker takes a particular action. Presumably, the user of such a site might abruptly stop after being hired, so any test whose results hinge upon some future action would be questionable. An online dating site, or a site that matches service providers with clients, could face a similar issue.