Select Page

6.6 Assessing the Results:  Statistical Tests


After we conduct an experiment, and gather all the related data, our work is still not done – we must assess the results we have gathered.  Part of that assessment process involves using a statistically-valid process for determining whether a difference between two numbers is truly meaningful. Of course, if our goal were to simply answer the direct question “Are these numbers different?”  then we would not need statistical tests at all. 

Intuitively, people understand this without needing any knowledge of statistics or its terminology.  We know that two quantities’ values might not be exactly the same, but the difference between them could be so small as to be unimportant, and/or attributable to random chance.

If a friend of yours told you that a coin was fair before flipping it, you would have no reason to doubt your friend’s claim. 

After one flip, when your friend shows you a result of “Tails”, you are not surprised – after all, that outcome had a 50/50 probability of occurring.  If he flips it a second time, and shows another “Tails” result, that would still not feel strange at all.  If you are familiar with probability rules, you know that there was a 1-in-4 chance of such an outcome occurring.  After a third, or perhaps even fourth consecutive “Tails” flip, your level of suspicion may start to rise – but you also know that such a result isn’t so completely abnormal that you should stop trusting your friend.

After 20 flips, though, your good-natured willingness to trust this friend would have given way to some skepticism, followed by outright disbelief.  Is such a result impossible?  No, you realize.  However, you also know that if your friend’s claim really were true, such a result would be extremely unlikely. 

Somewhere, therefore, between the fourth flip and the 20th flip, your good-natured belief in your friend gave way to some healthy skepticism, and then to complete doubt.  But when, exactly, should you start to disbelieve your coin-flipping buddy?  Besides just relying on your own instincts and wits, you would need some sort of methodical process that you could apply consistently to such problems.  The way we can quantify this process, and apply some statistical rigor to it, is with statistical testing.