Select Page

6.12 Independent Sample t-Test


An independent sample t-test has many names: two sample t-test, independent t-test, student’s t-test. This inferential statistical test can be used when working with normally-distributed, numeric data.

The assumptions of the t-test are the same regardless of whether they are two-tailed or one-tailed:

  • The data is continuous numerical data e.g. a person’s height, weight, or IQ score
  • The data is randomly sampled, and representative of the population
  • The data is normally distributed
  • Samples from both variables have approximately the same variance
  • The observations of Variable A are independent from Variable B
  • The dependent variable is continuous
  • The independent variable is a categorical variable with two levels

The formula for the two-sample t-test is shown above.  The values in the numerator are the mean values for each of the two groups.  In the denominator we use the variance and the count for each of the two groups.  Note that the independent sample t-test does not require us to have the same number of records in each group.  

When the outcome is a normally distributed numeric variable, a common statistical test for answering this question is the independent sample t-test.  A t-test will generate two values:  a t-value (which helps us to see the overall difference in the values), and a p-value (which provides information about the likelihood that our t-value could have been obtained by random chance).  We will determine our statistical significance level by subtracting our p-value from 1.  A p-value of .05 or smaller, therefore, is required for a statistical significance threshold of 95 percent. 

To see how the t-statistic and p-value are related, let’s start by taking a look at the t-distribution.

The plot above shows the t-distribution for three different degrees of freedom (df).  When working with the t-distribution, the df value is the number of total observations, minus 1.  Note that the shape of the t-distribution changes based on the df.  At lower numbers, the peak tends to be a bit shorter and the tails tend to be a bit “fatter,” implying that values further from the mean are more likely to occur, compared with larger df values.

Technically speaking, the area under the curve is infinite – any t-statistic is theoretically possible to attain, and therefore, the tails go on forever.  Practically speaking, however, since 99 percent of the area is covered in the space within three standard deviations of the mean, we can think of the area shown here as the total area underneath the curve.  68 percent of this area falls within one standard deviation of the mean.

If we could randomly generate any t-statistic under this curve, there is a 31.75 percent chance that it would be more extreme than 1 (and by more extreme, we mean further away from 0, in either direction).  That figure corresponds to the unshaded portion of the graph above. 

Okay, but what about a t-statistic of two?  Let’s take a look.

Now, when we move two standard deviations away from the mean, a much larger proportion of the area under the curve is shaded.  If we were obtaining t-statistics randomly, the chance that we would obtain something more extreme than 2 is fairly small.

To perform a t-test, we will need to provide the mean value, the standard deviation, and the number of samples for each of the values that we wish to compare.  In the example below, we will perform a t-test to help Lobster Land better understand the impact of part of its email marketing campaign from last summer. 

Last July, Lobster Land held an event known as “LobsterPalooza” which featured several popular country music artists.  Two weeks prior to the event, Lobster Land performed an A/B test using emails that went to 1500 people who had already purchased tickets.  750 of the subscribers (Group 1) saw an email that described the event, mentioned the acts in the line-up, and included some basic information about the performance dates and times.  Lobster Land included the following call to action at the bottom of the e-mail:  “Do not miss this wonderful chance for family fun!  Concert merchandise will be on sale during the event — be sure to grab a hat, mug, or t-shirt as a way to commemorate this special, star-studded occasion.”    Another 750 subscribers (Group 2) saw an e-mail message that was otherwise identical, but lacked the call-to-action statement at the bottom.  The null hypothesis of this experiment is that the call-to-action does not meaningfully impact a consumer’s actions.  To analyze whether the call-to-action impacted consumers’ actions, we will use the dataset abtest.csv.  This dataset includes a unique userID for each recipient of the e-mail, a column that tells us whether the person was in group 1 (received the email with the call-to-action) or group 2 (received the email without the call-to-action).  The last column, concert, tells us how much the person spent at Lobster Land on the evening of LobsterPalooza on non-ticket expenditures.  This column is a dollar value that could include food, beverage, or merchandise sales.

import pandas as pd
abtest = pd.read_csv(“abtest.csv”)
abtest.head()

After seeing the first few rows to familiarize ourselves with abtest, we can use the describe() function below to get a sense of the distribution of variables. 

This is an okay start for us — it tells us, for example, that mean spending came in just under $25, and was very tightly clustered around that value.   Since our overall goal here is to make a comparison between those who did or did  not receive the call-to-action, a grouping operation may offer us even more insight.  Thankfully, we can do this with the line of code shown below:

The table above shows a comparison of results for each group.  Unfortunately, Python cannot read our minds (at least, not yet!) and it does not know that UserID is just a unique value assigned to each customer.  The Python interpreter sees UserID as a numeric variable, so we are seeing a lot of summary statistics for UserID displayed here.  By adding just a bit more specificity to our code, we can finally see the group-to-group comparison, with nothing else clouding our view.

To get the t-statistic manually, we could apply the formula shown at the top of this page:

In that formula, X1 and X2 represent the mean values for groups 1 and 2, respectively.  s21 and s22 represent the variances for groups 1 and 2, respectively.  N1 and N2 represent the counts for groups 1 and 2, respectively. 

Here, instead of walking through the manual approach, we will use a function from scipy to help us out.  In the code below, we will specify the group memberships and the outcome values that we wish to compare. 

from scipy import stats
t, p = stats.ttest_ind(abtest.loc[abtest[‘group’] == 1, ‘concert’].values,abtest.loc[abtest[‘group’] == 2, ‘concert’].values, equal_var=False)

The p-value for our t-test is extremely low.  At a significance level far greater than 99 percent, we can state that the variation in the concert spending between members of the two groups is not the result of random chance.  We will reject the null hypothesis that there is no meaningful variation among the groups. 

Other statistical tests include but are not limited to the following:

  • a) Chi-squared test of independence

What is it for: Tests if two categorical variables are related

Assumptions:

  • The data in the cells should be frequencies or counts, not percentages
  • Each level should be mutually exclusive
  • Minimum of 5 responses per level

Sample scenario:

Is there a relationship between the marital status of Lobster Land employees and their highest education level?

  • b) One tailed T-test

What is it for: Tests if the means of two variables are statistically different. Used when the independent variable has two levels. In the sample scenario below, that would be male and female.

Assumptions:

  • The data is continuous numerical data e.g. a person’s height, weight, or IQ score
  • The data is randomly sampled, and representative of the population
  • The data is normally distributed
  • Samples from both variables have approximately the same variance
  • The observations of Variable A are independent from Variable B
  • The dependent variable is continuous
  • The independent variable is a categorical variable with two levels

Note: The difference in the 2-tailed T-test and the 1-tailed T-test lies in the way the null hypothesis is phrased.  In the sample scenarios shown below, the 1-tailed T-test is used when there is an assumption that males could spend more on merchandise than females. On the other hand, the 2-tailed T-test is called upon when the null hypothesis does not assume a direction.

Sample scenario (1-tailed T-test):
Do males tend to spend more on Lobster Land merchandise than females ?

Sample scenario (2-tailed T-test):
There is no difference in the amount of money male and female visitors spend at Lobster Land.

  • c) One way analysis of variance (ANOVA)

What is it for: Tests if the means of two or more variables are statistically different. Used when the independent variable has more than two levels.

Assumptions:

  • The dependent variable is continuous
  • The independent variable is a categorical variable with two or more levels
  • The observations in each variable must be independent
  • The data when plotted forms a normal distribution
  • The data is randomly sampled, and representative of the population
  • There are no outliers