Report an error

6.15 A/B Testing Pitfalls

To run a successful A/B test, it is essential that the users are randomly assigned to either group. If the split is created in any other way, the results of the test cannot be considered valid. Splitting your consumers based on the chronological order of their sign-up date, their total spending, or a demographic feature such as their age means that a confounding variable has now entered the experiment. Since A/B testing is designed to identify cause-and-effect relationships, a confounding variable would throw the results of such a test into question.

Furthermore, valid interpretation of A/B test results depends on the key assumption that the only changed variable is the one that the experimenter wishes to explore. This ensures that there is not some other confounding variable that might be influencing the results. Many people run A/B tests sequentially, perhaps changing the button color first (blue or green), followed by font type (Comic Sans or Georgia) …etc. The downside about sequential A/B testing is that it does not consider relatively more complex scenarios where factors interact. Consider the following scenario:

A/B test 1: Variable that is changed: button color

Winner: Green

Button color	Font type	‘Call to action’ message on button
Blue	Georgia	Subscribe now
Green	Georgia	Subscribe now

A/B test 2: Variable that is changed: Font type

Winner: Comic sans

Button color	Font type	‘Call to action’ message on button
Green	Comic Sans	Subscribe now
Green	Georgia	Subscribe now

A/B test 3: Variable that is changed: Call-to-action message

Winner: Hit me up, baby!

Button color	Font type	‘Call-to-action’ message on button
Green	Comic sans	Hit me up, baby!
Green	Comic sans	Subscribe now

In the above scenario, it may be that users prefer green on average but would rather have blue when it is paired with comic sans. The problem is that this combination of ‘blue+comic sans’ was not offered to users in any of the A/B tests listed above because the color green won the first test.

That is why some marketers prefer multivariate testing⁴, which is separate from A/B testing, because marketers perform experiments in which multiple variables are altered simultaneously. A multivariate test of a website could involve a change in the navigation menu layout and a change in color scheme. With three different menu layouts and four different color schemes, there would be 12 total unique combinations to compare. To the best degree possible, an A/B test should be conducted when there are as few outside variables as possible that could influence someone’s actions. An A/B test of a feature on a job search site could be hard to interpret, as there are so many factors that could impact whether a job seeker takes a particular action. Presumably, the user of such a site might abruptly stop after being hired, so any test whose results hinge upon some future action would be questionable. An online dating site, or a site that matches service providers with clients, could face a similar issue.

⁴ https://hbr.org/2017/06/a-refresher-on-ab-testing