Learning Outcomes
By the end of this section, you should be able to:
- 4.2.1 Apply hypothesis testing methods to test statistical claims involving one sample.
- 4.2.2 Use Python to assist with hypothesis testing calculations.
- 4.2.3 Conduct hypothesis tests to compare two means, two proportions, and matched pairs data.
Constructing confidence intervals is one method used to estimate population parameters. Another important statistical method to test claims regarding population parameters is called hypothesis testing.
Hypothesis testing is an important tool for data scientists in that it is used to draw conclusions using sample data, and it is also used to quantify uncertainty associated with these conclusions.
As an example to illustrate the use of hypothesis testing in a data science applications, consider an online retailer seeking to increase revenue though a text messaging ad campaign. The retailer is interested in examining the hypothesis that a text messaging ad campaign leads to a corresponding increase in revenue. To test out the hypothesis, the retailer sends out targeted text messages to certain customers but does not send the text messages to another group of customers. The retailer then collects data for purchases made by customers who received the text message ads versus customers who did not receive the ads. Using hypothesis testing methods, the online retailer can come to a conclusion regarding the marketing campaign to transmit ads via text messaging. Hypothesis testing involves setting up a null hypothesis and alternative hypotheses based on a claim to be tested and then collecting sample data. A null hypothesis represents a statement of no effect or no change in the population; the null hypothesis provides a statement that the value of a population parameter is equal to a value. An alternative hypothesis is a complementary statement to the null hypothesis and makes a statement that the population parameter has a value that differs from the null hypothesis.
Hypothesis testing allows a data scientist to assess the likelihood of observing results under the assumption of the null hypothesis and make judgements about the strength of the evidence against the null hypothesis. It provides the foundation for various data science investigations and plays a role at different stages of the analyses such as data collection and validation, modeling related tasks, and determination of statistical significance.
For example, a local restaurant might claim that the average delivery time for food orders is 30 minutes or less. To test the claim, food orders can be placed and corresponding delivery times recorded. If the sample data appears to be inconsistent with the null hypothesis, then the decision is to reject the null hypothesis.
Testing Claims Based on One Sample
For our discussion, we will examine hypothesis testing methods for claims involving means or proportions. (Hypothesis testing can also be performed for standard deviation or variance, though that is beyond the scope of this text.)
The steps for hypothesis testing are as follows:
- Set up a null and alternative hypothesis based on the claim. Identify whether the null or the alternative hypothesis represents the claim.
- Collect relevant sample data to investigate the claim.
- Determine the correct distribution to be used to analyze the hypothesis test (e.g., normal distribution or t-distribution).
- Analyze the sample data to determine if the sample data is consistent with the null hypothesis. This will involve statistical calculations involving what is called a test statistic and a p-value (discussed in the next section).
- If the sample data is inconsistent with the null hypothesis, then the decision is to “reject the null hypothesis.” If the sample data is consistent with the null hypothesis, then the decision is to “fail to reject the null hypothesis.”
The first step in the hypothesis testing process is to write what is a called a “null” hypothesis and an “alternative” hypothesis. These hypotheses will be statements regarding an unknown population parameter, such as a population mean or a population proportion. These two hypotheses are based on the claim under study and will be written as complementary statements to each other. When one of the hypotheses is true, the other hypothesis must be false (since they are complements of one another). Note that either the null or alternative hypothesis can represent the claim.
The null hypothesis is labeled as and is a statement of equality—for example, the null hypothesis might state that the mean time to complete an exam is 42 minutes. For this example, the null hypothesis would be written as:
Where the notation refers to the population mean time to complete an exam.
The null hypothesis will always contain one of the following three symbols: .
The alternative hypothesis is labeled as and will be the complement of the null hypothesis. For example, if the null hypothesis contains an equals symbol, then the alternative hypothesis will contain a not equals symbol.
If the null hypothesis contains a symbol, then the alternative hypothesis will contain a symbol.
If the null hypothesis contains a symbol, then the alternative hypothesis will contain a symbol.
Note that the alternative hypothesis will always contain one of the following three symbols: .
To write the null and alternative hypotheses, start off by translating the claim into a mathematical statement involving the unknown population parameter. Depending on the wording, the claim can be placed in either the null or the alternative hypothesis. Then write the complement of this statement as the other hypothesis (see Table 4.6).
When translating claims about a population mean, use the symbol as part of the null and alternative hypotheses. When translating claims about a population proportion, use the symbol p as part of the null and alternative hypotheses.
Table 4.6 provides the three possible setups for the null and alternative hypotheses when conducting a one-sample hypothesis test for a population mean. Notice that for each setup, the null and alternative hypotheses are complements of one another. Note that the value of “” will be replaced by some numerical constant taken from the stated claim.
Setup A | Setup B | Setup C |
---|---|---|
Table 4.7 provides the three possible setups for the null and alternative hypotheses when conducting a one-sample hypothesis test for a population proportion. Note that the value of “” will be replaced by some numerical constant taken from the stated claim.
Setup D | Setup E | Setup F |
---|---|---|
Follow these steps to determine which of these setups to use for a given hypothesis test:
- Determine if the hypothesis test involves a claim for a population mean or population proportion. If the hypothesis test involves a population mean, use either Setup A, B, or C.
- If the hypothesis test involves a population proportion, use either Setup D, E, or F.
- Translate a phrase from the claim into a mathematical symbol and then match this mathematical symbol with one of the setups in Table 4.6 and Table 4.7.
For example, if a claim for a hypothesis test for a mean references the phrase “at least,” note that the phrase “at least” corresponds to a greater than or equals symbol. A greater than or equals symbol appears in Setup C in Table 4.6, so that would be the correct setup to use.
If a claim for a hypothesis test for a proportion references the phrase “different than,” note that the phrase “different than” corresponds to a not equals symbol. A not equals symbol appears in Setup D in Table 4.6, so that would be the correct setup to use.
Example 4.11
Problem
Write the null and alternative hypotheses for the following claim:
An auto repair facility claims the average time for an oil change is less than 20 minutes.
Solution
Note that the claim involves the phrase “less than,” which translates to a less than symbol. A less than symbol must be used in the alternative hypothesis. The complement of less than is a greater than or equals symbol, which will be used in the null hypothesis. The claim refers to the population mean time for an oil change, so the symbol will be used. Notice that this setup will correspond to Setup C from Table 4.6. Thus the setup of the null and alternative hypothesis will be as follows:
Example 4.12
Problem
Write the null and alternative hypotheses for the following claim:
A medical researcher claims that the proportion of adults in the United States who are smokers is at most 25%.
Solution
Note that the claim involves the phrase “at most,” which translates to a less than or equals symbol. A less than or equals symbol must be used in the null hypothesis. The complement of less than or equals is a greater than symbol, which will be used in the alternative hypothesis. The claim refers to the population proportion of smokers, so the symbol will be used. Notice that this setup will correspond to Setup E from Table 4.7. Thus, the setup of the null and alternative hypotheses will be as follows:
The basis of the hypothesis test procedure is to assume that the null hypothesis is true to begin with and then examine the sample data to determine if the sample data is consistent with the null hypothesis. Based on this analysis, you will decide to either reject the null hypothesis or fail to reject the null hypothesis.
It is important to note that these are the only two possible decisions for any hypothesis test, namely:
- Reject the null hypothesis, or
- Fail to reject the null hypothesis.
Keep in mind that this decision will be based on statistical analysis of sample data, so there is a small chance of coming to a wrong decision. The only way to be absolutely certain of your decision is to test the entire population, which is typically not feasible. Since the hypothesis test decision is based on a sample, there are two possible errors that can be made by the researcher:
- The researcher rejects the null hypothesis when in fact the null hypothesis is actually true, or a Type I error.
- A researcher fails to reject the null hypothesis when the null hypothesis is actually false, or a Type II error.
In hypothesis testing, the maximum allowed probability of making a Type I error is called the level of significance, denoted by the Greek letter alpha, . From a practical standpoint, the level of significance is the probability value used to determine when the sample data indicates significant evidence against the null hypothesis. This level of significance is typically set to a small value, which indicates that the researcher wants the probability of rejecting a true null hypothesis to be small. Typical values of the level of significance used in hypothesis testing are as follows: , , or .
Once the null and alternative hypotheses are written and a level of significance is selected, the next step is for the researcher to collect relevant sample data from the population under study. It’s important that a representative random sample is selected, as discussed in Handling Large Datasets. After the sample data is collected, two calculations are performed: test statistic and p-value.
The test statistic is a numerical value used to assess the strength of evidence against a null hypothesis and is calculated from sample data that is used in hypothesis testing. As noted in Estimating Parameters with Confidence Intervals, a sample statistic is a numerical value calculated from a sample of observations drawn from a larger population such as a sample mean or sample proportion. The sample statistic is converted to a standardized test statistic such as a z-score or t-score based on the assumption that the null hypothesis is true.
Once the test statistic has been determined, a probability, or p-value, is created. A p-value is the probability of obtaining a sample statistic with a value as extreme as (or more extreme than) the value determined by the sample data under the assumption that the null hypothesis is true. To calculate a p-value, we will find the area under the relevant probability distribution, such as finding the area under the normal distribution curve or the area under the t-distribution curve. Since the p-value is a probability, any p-value must always be a numerical value between 0 and 1 inclusive.
When calculating a p-value as the area under the probability distribution curve, the corresponding area will be determined using the location under the curve, which favors the rejection of the null hypothesis as follows:
- If the alternative hypothesis contains a “less than” symbol, the hypothesis test is called a “left tailed” test and the p-value will be calculated as the area to the left of the test statistic.
- If the alternative hypothesis contains a “greater than” symbol, the hypothesis test is called a “right tailed” test and the p-value will be calculated as the area to the right of the test statistic.
- If the alternative hypothesis contained a “not equals” symbol, the hypothesis test is called a “two tailed” test and the p-value will be calculated as the sum of the area to the left of the negative test statistic and the area to the right of the positive test statistic.
The smaller the p-value, the more the sample data deviates from the null hypothesis, and this is more evidence to indicate that the null hypothesis should be rejected.
The final step in the hypothesis testing procedure is to come to a final decision regarding the null hypothesis. This is accomplished by comparing the p-value with the level of significance and applying the following rule:
- If the p-value level of significance, then the decision is to “reject the null hypothesis.”
- If the p-value level of significance, then the decision is to “fail to reject the null hypothesis.”
Recall that these are the only two possible decisions that a researcher can deduce when conducting a hypothesis test.
Often, we would like to translate these decisions into a conclusion that is easier to interpret for someone without a statistical background. For example, a decision to “fail to reject the null hypothesis” might be difficult to interpret for those not familiar with the terminology of hypothesis testing. These decisions can be translated into concluding statements such as those shown in Table 4.8.
Select the decision from the hypothesis test: | Is the claim in the null or the alternative hypothesis? | Then, use this concluding statement: | |
---|---|---|---|
Row 1 | Reject the null hypothesis | Claim is the null hypothesis | There is enough evidence to reject the claim. |
Row 2 | Reject the null hypothesis | Claim is the alternative hypothesis | There is enough evidence to support the claim. |
Row 3 | Fail to reject the null hypothesis | Claim is the null hypothesis | There is not enough evidence to reject the claim. |
Row 4 | Fail to reject the null hypothesis | Claim is the alternative hypothesis | There is not enough evidence to support the claim. |
Testing Claims for the Mean When the Population Standard Deviation Is Known
Recall that in the discussion for confidence intervals we examined the confidence interval for the population mean when the population standard deviation is known (normal distribution is used) or when the population standard deviation is unknown (t-distribution is used). We follow the same procedure when conducting hypothesis tests.
In this section, we discuss hypothesis testing for the mean when the population standard deviation is known. Although the population standard deviation is typically unknown, in some cases a reasonable estimate for the population standard deviation can be obtained from past studies or historical data.
Here are the requirements to use this procedure:
- A random sample is selected from the population.
- The sample size is at least 30, or the underlying population is known to follow a normal distribution.
- The population standard deviation () is known.
When these requirements are met, the normal distribution is used as the basis for conducting the hypothesis test.
For this procedure, the test statistic will be a z-score and the standardized test statistic is calculated as follows:
Where:
is the sample mean.
is the hypothesized value of the population mean, which comes from the null hypothesis.
is the population standard deviation.
is the sample size.
Example 4.13 provides a step-by-step example to illustrate the hypothesis testing process.
Example 4.13
Problem
A random sample of 100 college students finds that students spend an average of 14.4 hours per week on social media. Use this sample data to test the claim that college students spend at least 15 hours per week on social media. Use a level of significance of 0.05 and assume the population standard deviation is 3.25 hours.
Solution
The first step in the hypothesis testing procedure is to write the null and alternative hypotheses based on the claim. Notice that the claim involves the phrase “at least,” which translates to a greater than or equals symbol. A greater than or equals symbol must be used in the null hypothesis. The complement of greater than or equals is a less than symbol, which will be used in the alternative hypothesis. The claim refers to the population mean time that college students spend on social media, so the symbol will be used. The claim corresponds to the null hypothesis.
Notice that this setup will correspond to Setup C from Table 4.6. Thus the setup of the null and alternative hypotheses will be as follows:
Since the population standard deviation is known in this example, the normal distribution will be used to conduct the hypothesis test.
Next, calculate the standardized test statistic:
Notice that since the alternative hypothesis contains a “less than” symbol, the hypothesis test is called a “left tailed” test and the p-value will be calculated as the area to the left of the test statistic. The area under the standard normal curve to the left of a z-score of is approximately 0.0324. Thus, the p-value for this hypothesis test is 0.0324.
The area under the standard normal curve can be found using software such as Excel, Python, or R.
For example, in Python, the area under the normal curve to the left of a z-score of can be obtained using the function norm.cdf()
.
The syntax for using this function is
norm.cdf(x, mean, standard_deviation)
Where:
x
is the measurement of interest—in this case, this will be the test statistic of .
mean
is the mean of the standard normal distribution, which is 0.
standard_deviation
is the standard deviation of the standard normal distribution, which is 1.
Here is the Python code to calculate the p-value for this example:
Python Code
# from scipy.stats import norm
from scipy.stats import norm
# define parameters x, mean and standard_deviation:
x = -1.846
mean = 0
standard_deviation = 1
# use norm.cdf function to calculate normal probability - note this is the area to the left of the x-value
# subtract this result from 1 to obtain area to the right of the x-value
# use round function to round answer to 4 decimal places
round (norm.cdf(x, mean, standard_deviation), 4)
The resulting output will look like this:
0.0324
The final step in the hypothesis testing procedure is to come to a final decision regarding the null hypothesis. This is accomplished by comparing the p-value with the level of significance. In this example, the p-value is 0.032 and the level of significance is 0.05: since the p-value level of significance, then the decision is to “reject the null hypothesis.”
Conclusion: The decision is to reject the null hypothesis; the claim corresponds to the null hypothesis. This scenario corresponds to Row 1 in Table 4.8, which means “there is enough evidence to reject the claim” that college students spend at least 15 hours per week on social media.
Testing Claims for the Mean When the Population Standard Deviation Is Unknown
In this section, we discuss one sample hypothesis testing for the mean when the population standard deviation is unknown. This is a more common application of hypothesis testing for the mean in that the population standard deviation is typically unknown; however, the sample standard deviation can be calculated based on the sample data.
Here are the requirements to use this procedure:
- A random sample is selected from the population.
- The sample size is at least 30, or the underlying population is known to follow a normal distribution.
- The population standard deviation () is unknown; the sample standard deviation () is known.
When these requirements are met, the t-distribution is used as the basis for conducting the hypothesis test.
For this procedure, the test statistic will be a t-score and the standardized test statistic is calculated as follows:
Where:
is the sample mean.
is the hypothesized value of the population mean, which comes from the null hypothesis.
is the sample standard deviation.
is the sample size.
Example 4.14 provides a step-by-step example to illustrate this process.
Example 4.14
Problem
A smartphone manufacturer claims that the mean battery life of its latest smartphone model is 25 hours. A consumer group decides to test this hypothesis and collects a sample of 50 smartphones and determines that the mean battery life of the sample is 24.1 hours with a sample standard deviation of 4.1 hours. Use this sample data to test the claim made by the smartphone manufacturer. Use a level of significance of 0.10 for this analysis.
Solution
The first step in the hypothesis testing procedure is to write the null and alternative hypotheses based on the claim. Notice that the claim involves the phrase “is 25 hours,” where “is” will translate to an equals symbol. An equals symbol must be used in the null hypothesis. The complement of an “equals” symbol is a “not equals” symbol, which will be used in the alternative hypothesis. The claim refers to the population mean battery life of smartphones, so the symbol will be used. The claim corresponds to the null hypothesis. Notice that this setup will correspond to Setup A from Table 4.6. Thus the setup of the null and alternative hypotheses will be as follows:
Since the population standard deviation is unknown in this example, the t-distribution will be used to conduct the hypothesis test.
Next, calculate the standardized test statistic as follows:
Notice that since the alternative hypothesis contains a “not equals” symbol, the hypothesis test is called a “two tailed” test and the p-value will be calculated as the area to the left of the negative test statistic plus the area to the right of the positive test statistic. The area under the t-distribution curve to the left of a t-score of is approximately 0.0635 (using degrees of freedom ). The area under the t-distribution curve to the right of a t-score of 1.552 is approximately 0.0635. Thus, the p-value for this hypothesis test is the sum of the areas, which is 0.127.
The area under the t-distribution curve can be found using software.
For example, in Python, the area under the t-distribution curve to the left of a t-score of can be obtained using the function t.cdf()
The syntax for using this function is
t.cdf(x, df)
Where:
x
is the measurement of interest—in this case, this will be the test statistic of -1.846.
df
is the degrees of freedom.
Here is the Python code to calculate the p-value for this example:
Python Code
# from scipy.stats import t
from scipy.stats import t
# define parameters test_statistic and degrees of freedom df
x = -1.552
df = 49
# use t.cdf function to calculate area under t-distribution curve - note this is the area to the left of the x-value
# subtract this result from 1 to obtain area to the right of the x-value
# use round function to round answer to 4 decimal places
round (t.cdf(x, df=49), 4)
The resulting output will look like this:
0.0635
The final step in the hypothesis testing procedure is to come to a final decision regarding the null hypothesis. This is accomplished by comparing the p-value with the level of significance. In this example, the p-value is 0.127 and the level of significance is 0.10: since the p-value level of significance, then the decision is to “fail to reject the null hypothesis.”
Conclusion: The decision is to fail to reject the null hypothesis; the claim corresponds to the null hypothesis. This scenario corresponds to Row 3 in Table 4.8, which means “there is not enough evidence to reject the claim” that the mean battery life of its latest smartphone model is 25 hours.
Testing Claims for the Proportion
In this section, we discuss one sample hypothesis testing for proportions. Recall that a sample proportion is measuring “how many out of the total.” For example, a medical researcher might be interested in testing a claim that the proportion of patients experiencing side effects from a new blood pressure–lowering medication is less than 4%.
Here are the requirements to use this procedure:
- A random sample is selected from the population.
- Verify that the normal approximation to the binomial distribution is appropriate by ensuring that both and are both at least 5.
The sample proportion is calculated as the number of successes divided by the sample size:
When these requirements are met, the normal distribution is used as the basis for conducting the hypothesis test.
For this procedure, the test statistic will be a z-score and the standardized test statistic is calculated as follows:
Where:
represents the sample proportion.
is the hypothesized value of the population proportion, which is taken from the null hypothesis.
is the sample size.
Here is a step-by-step example to illustrate this hypothesis testing process.
Example 4.15
Problem
A college professor claims that the proportion of students using artificial intelligence (AI) tools as part of their coursework is less than 45%. To test the claim, the professor selects a sample of 200 students and surveys the students to determine if they use AI tools as part of their coursework. The results from the survey indicate that 74 out of 200 students use AI tools. Use this sample data to test the claim made by the college professor. Use a level of significance of 0.05 for this analysis.
Solution
The first step in the hypothesis testing procedure is to write the null and alternative hypothesis based on the claim. Notice that the claim involves the phrase “less than,” which will translate to a less than symbol. A less than symbol must be used in the alternative hypothesis. The complement of a “less than” symbol is a “greater than or equal” symbol, which will be used in the null hypothesis. The claim refers to the population proportion of students using AI tools, so the symbol p will be used. Notice that this setup will correspond to Setup F from Table 4.7. Thus, the setup of the null and alternative hypotheses will be as follows:
The sample proportion can be calculated as follows:
Verify that the normal approximation to the binomial distribution is appropriate by ensuring that both and are both at least 5, where represents the sample proportion.
For this example, , and . Both of these results are at least 5, which verifies that the normal approximation to the binomial distribution is appropriate.
Since the requirements for a hypothesis test for a proportion have been met, the normal distribution will be used to conduct the hypothesis test.
Next, calculate the standardized test statistic as follows:
Notice that since the alternative hypothesis contains a “not equals” symbol, the hypothesis test is called a “left tailed” test and the p-value will be calculated as the area to the left of the test statistic. The area under the normal curve to the left of a z-score of is approximately 0.012. Thus, the p-value for this hypothesis test is 0.012.
Here is the Python code to calculate the p-value for this example:
Python Code
# from scipy.stats import norm
from scipy.stats import norm
# define parameters x, mean, and standard_deviation:
x = -2.274
mean = 0
standard_deviation = 1
# use norm.cdf function to calculate normal probability - note this is the area to the left of the x-value
# subtract this result from 1 to obtain area to the right of the x-value
# use round function to round answer to 3 decimal places
round (norm.cdf(x, mean, standard_deviation), 4)
The resulting output will look like this:
0.0115
The final step in the hypothesis testing procedure is to come to a final decision regarding the null hypothesis. This is accomplished by comparing the p-value with the level of significance. In this example, the p-value is 0.012 and the level of significance is 0.05: since the p-value level of significance, then the decision is to “reject the null hypothesis.”
Conclusion: The decision is to reject the null hypothesis; the claim corresponds to the alternative hypothesis. This scenario corresponds to Row 2 in Table 4.8, which means “there is enough evidence to support the claim” that the proportion of students using artificial intelligence (AI) tools as part of their coursework is less than 45%.
Using Python to Conduct Hypothesis Tests
Python provides several functions to assist with hypothesis testing, as the ztest()
can be used for hypothesis testing for the mean when population standard deviation is known and another Python function called ttest_1samp()
can be used for hypothesis testing for the mean when population standard deviation is unknown. See the example:
Example 4.16
Problem
A medical researcher wants to test a claim that the average oxygen level for females is greater than 75 mm Hg. In order to test the claim, a sample of 15 female patients is selected and the following oxygen levels are recorded:
76, 75, 74, 81, 76, 77, 71, 74, 73, 79
Use the ttest_1samp()
function in Python to conduct a hypothesis test to test the claim. Assume a level of significance of 0.05.
Solution
The first step in the hypothesis testing procedure is to write the null and alternative hypotheses based on the claim. Notice that the claim involves the phrase “greater than 75 mm Hg,” where “greater than” will translate to a greater than symbol. A greater than symbol must be used in the alternative hypothesis. The complement of a “greater than” symbol is a “less than or equals” symbol, which will be used in the null hypothesis. The claim refers to the population mean oxygen level, so the symbol will be used. The claim corresponds to the alternative hypothesis. Notice that this setup will correspond to Setup B from Table 4.6. Thus the setup of the null and alternative hypotheses will be as follows:
Since the population standard deviation is unknown in this example, the t-distribution will be used to conduct the hypothesis test.
To use the ttest_1samp()
function in Python, the syntax is as follows:
Ttest_1samp(data_array, null_hypothesis_mean)
The function will then return the value of the test statistic and the two-tailed p-value.
In this example we are interested in the right tailed p-value, so the Python program will calculate one-half of the p-value returned by the function.
Here is the Python code for this example:
Python Code
import scipy.stats as stats
import numpy as np
# Enter data array
oxygen_level = np.array([76, 75, 74, 81, 76, 77, 71, 74, 73, 79])
# Hypothesized population mean
null_hypothesis_mean = 75
# Perform one-sample t-test
test_stat, two_tailed_p_value = stats.ttest_1samp(oxygen_level, null_hypothesis_mean)
# Obtain the one tailed p-value
one_tailed_p_value = two_tailed_p_value / 2
# print out test statistic and p-value and round to 3 decimal places
print("Test Statistic = ", test_stat)
print("p-value = ", one_tailed_p_value)
The resulting output will look like this:
Test Statistic = 0.6512171447631733
p-value = 0.2655900985964262
The final step in the hypothesis testing procedure is to come to a final decision regarding the null hypothesis. This is accomplished by comparing the p-value with the level of significance. In this example, the p-value is 0.266 and the level of significance is 0.05: since the p-value level of significance, the decision is to “fail to reject the null hypothesis.”
Conclusion: The decision is to fail to reject the null hypothesis; the claim corresponds to the alternative hypothesis. This scenario corresponds to Row 4 in Table 4.8, which means “there is not enough evidence to support the claim” that the average oxygen level for female patients is greater than 75 mm Hg.
Hypothesis Testing for Two Samples
In the previous section, we examined methods to conduct hypothesis testing for one sample. Researchers and data scientists are also often interested in conducting hypothesis tests for two samples. A researcher might want to compare two means or two proportions to determine if two groups are significantly different from one another. For example, a medical researcher might be interested in comparing the mean blood pressure between two groups of individuals: one group that is taking a blood pressure–lowering medication and another group that is taking a placebo. A government researcher might be interested in comparing the proportion of smokers for male versus female adults.
The same general hypothesis testing procedure will be used for two samples, but the setup of the null and alternative hypotheses will be structured to reflect the comparison of two means or two proportions.
As a reminder, the general hypothesis testing method introduced in the previous section includes the following steps:
- Set up a null and an alternative hypothesis based on the claim.
- Collect relevant sample data to investigate the claim.
- Determine the correct distribution to be used to analyze the hypothesis test (e.g., normal distribution or t-distribution).
- Analyze the sample data to determine if the sample data is consistent with the null hypothesis. This will involve statistical calculations involving a test statistic and a p-value, which are discussed next.
- If the sample data is inconsistent with the null hypothesis, then the decision is to “reject the null hypothesis.” If the sample data is consistent with the null hypothesis, then the decision is to “fail to reject the null hypothesis.”
As discussed for one-sample hypothesis testing, when writing the null and alternative hypotheses, start off by translating the claim into a mathematical statement involving the unknown population parameters. The claim could correspond to either the null or the alternative hypothesis. Then write the complement of this statement in order to write the other hypothesis. The difference now as compared to the previous section is that the hypotheses will be comparing two means or two proportions instead of just one mean or one proportion.
When translating claims about a population mean, use the symbols and as part of the null and alternative hypotheses to represent the two unknown population means. When translating claims about a population proportions, use the symbols p1 and p2 as part of the null and alternative hypotheses to represent the two unknown population proportions. Refer to Table 4.9 and Table 4.10 for the various setups to construct the null and alternative hypotheses for two-sample hypothesis testing.
Setup A | Setup B | Setup C |
---|---|---|
Setup D | Setup E | Setup F |
---|---|---|
Comparing Two Means
In this section, we discuss two-sample hypothesis testing for the mean when the population standard deviations are unknown. Typically, in real-world applications, population standard deviations are unknown, so this method has considerable practical appeal.
One other consideration when conducting hypothesis testing for two samples is to determine if the two samples are dependent or independent. Two samples are considered to be independent samples when the sample values from one population are not related to the sample values taken from the second population. In other words, there is not an inherent paired or matched relationship between the two samples. Two samples are considered to be dependent samples if the samples from one population can be paired or matched to the samples taken from the second population. Dependent samples are sometimes also called paired samples.
For example, if a researcher were to collect data for the ages for a random sample of 15 men and a random sample of 15 women, there would be no inherent relationship between these two samples, and so these samples are considered independent samples.
On the other hand, let’s say a researcher were to collect data on the ages of a random sample of 15 adults and also collect data on the ages of those adults’ corresponding (15) spouses/partners; these would be considered dependent samples in that there is an inherent pairing of domestic partners.
Example 4.17
Problem
Determine if the following samples are dependent or independent samples:
Sample #1: Scores on a statistics exam for 40 women college students
Sample #2: Scores on a statistics exam for 35 men college students
Solution
These two samples are independent because there is no inherent match-up between the women students from Sample #1 and the men students from Sample #2; it is not possible to analyze either of the samples as dependent, or paired, samples.
Example 4.18
Problem
Determine if the following samples are dependent or independent samples:
Sample #1: Blood pressure readings on a sample of 50 patients in a clinical trial before taking any medication. Sample #2: Blood pressure readings on the same group of 50 patients after taking a blood pressure–lowering medication.
Solution
These two samples are dependent. The samples can be paired as before/after blood pressure readings for the same patient.
Here are the requirements to use this procedure:
- The samples are random and independent.
- For each sample, the sample sizes are both at least 30, or the underlying populations are known to follow a normal distribution.
- The population standard deviations () are unknown; the sample standard deviations () are known.
When these requirements are met, the t-distribution is used as the basis for conducting the hypothesis test.
In this discussion, we assume that the population variances are not equal, which leads to the following formula for the test statistic.
For this procedure, the test statistic will be a t-score and the standardized test statistic is calculated as follows:
Where:
, are the sample means.
, are the hypothesized values of the population means.
are the sample standard deviations.
are the sample sizes.
Note: For the t-distribution, the degrees of freedom will be the smaller of and .
Example 4.19 illustrates this hypothesis testing process.
Example 4.19
Problem
A software engineer at a communications company claims that the average salary of women engineers is less than that of men engineers. To test the claim, a human resource administrator selects a random sample of 45 women engineers and 50 men engineers and collects statistical data as shown in Table 4.11. (Dollar amounts are in thousands of dollars.)
Women Engineers | Men Engineers | |
---|---|---|
Sample Mean | ||
Sample Standard Deviation | ||
Sample Size |
Use this sample data to test the claim made by the software engineer. Use a level of significance of 0.05 for this analysis.
Solution
The first step in the hypothesis testing procedure is to write the null and alternative hypotheses based on the claim. Notice that the claim mentions “average salary of women engineers is less than that of men engineers, where “less than” will translate to a less than symbol. A less than symbol must be used in the alternative hypothesis. The complement of a “less than” symbol is a “greater than or equals” symbol, which will be used in the null hypothesis. Let represent the population mean salary for women engineers and let represent the population mean salary for men engineers. The null hypothesis can be considered to represent that we expect there to be “no difference” between the salaries of women and men engineers. Note that this setup will correspond to Setup C from Table 4.9. The claim corresponds to the alternative hypothesis. Thus, the setup of the null and alternative hypotheses will be as follows:
Since the population standard deviations are unknown in this example, the t-distribution will be used to conduct the hypothesis test.
Next, calculate the standardized test statistic as follows:
Notice that since the alternative hypothesis contains a “less than” symbol, the hypothesis test is called a “left tailed” test and the p-value will be calculated as the area under the t-distribution curve to the left of the test statistic. The area under the t-distribution curve to the left of a t-score of (using degrees of freedom ) is approximately 0.025. Thus, the p-value for this hypothesis test is 0.025.
The final step in the hypothesis testing procedure is to come to a final decision regarding the null hypothesis. This is accomplished by comparing the p-value with the level of significance. In this example, the p-value is 0.025 and the level of significance is 0.04: since the p-value level of significance, then the decision is to “reject the null hypothesis.”
Conclusion: The decision is to reject the null hypothesis; the claim corresponds to the alternative hypothesis. This scenario corresponds to Row 2 in Table 4.8, and thus “there is enough evidence to support the claim” that the average salary of women engineers is less than that of men engineers.
This problem can also be solved using the Python function called ttest_ind_from_stats()
.
There is also a Python function to perform a two-sample t-test given raw data for the two groups, and this Python function is ttest_ind()
.
For the function ttest_ind_from_stats()
, the syntax is as follows:
Python Code
stats.ttest_ind_from_stats(sample_mean1, sample_standard_deviation1, sample_size1, sample_mean2, sample_standard_deviation2, sample_size2)
Here is the Python code that can be used to solve Example 4.19. Note that this function returns the two-tailed p-value. Since Example 4.19 involves a left-tailed test, the desired p-value is one-half of the two-tailed p-value.
Python Code
import scipy.stats as stats
import numpy as np
#Example 4.19 - Two Sample t-test given sample means, sample standard deviations
sample_mean1 = 72700
sample_standard_deviation1 = 9800
sample_size1 = 45
sample_mean2 = 76900
sample_standard_deviation2 = 10700
sample_size2 = 50
test_statistic, two_tailed_p_value = stats.ttest_ind_from_stats(sample_mean1, sample_standard_deviation1, sample_size1, sample_mean2, sample_standard_deviation2, sample_size2)
left_tailed_p_value = two_tailed_p_value / 2
print("test statistic = ", "%.3f" % test_statistic)
print("p-value = ", "%.3f" % left_tailed_p_value)
The resulting output will look like this:
test statistic = -1.988
p-value = 0.025
Comparing Matched Pairs Data
Hypothesis Testing for Two Samples discussed hypothesis testing for two independent samples. In this section, we turn our attention to hypothesis testing for dependent samples, often called matched pairs, paired samples, or paired data.
Recall that two samples are considered to be dependent if the samples from one population can be paired, or matched, to the samples taken from the second population.
This analysis method is often used to analyze before/after data to determine if a significant change has occurred. In a clinical trial, a medical researcher may be interested in comparing before and after blood pressure readings on a patient to assess the efficacy of a blood pressure–lowering medication. Or an employer might compare scores on a safety exam for a sample of employees before and after employees have taken a course on safety procedures and practices. A parent might want to compare before/after SAT scores for students who have taken an SAT preparation course. This type of analysis can help to assess the effectiveness and value of such training.
When conducting a hypothesis test for paired data such as before/after data, a key step will be to calculate the “difference” for each pair of data values, as follows:
Note that when calculating these differences, the result can be either a positive or a negative value.
Here are the requirements to use this procedure:
- The samples are random and dependent (i.e., paired data).
- The populations are normally distributed or the number of data pairs is at least 30.
When these requirements are met, the t-distribution is used as the basis for conducting the hypothesis test.
The test statistic will be the average of the differences across the data pairs (labeled as ), where:
For this procedure, the standardized test statistic is calculated as follows:
Where:
is the mean of the differences between paired data.
is the hypothesized population mean of the differences.
is the standard deviation of the differences between paired data.
is the number of data pairs.
Note: For the t-distribution, the degrees of freedom will be , where is the number of data pairs.
The setup of the null and alternative hypotheses will follow one of the setups shown in Table 4.12:
Setup A | Setup B | Setup C |
---|---|---|
Example 4.20 illustrates this hypothesis testing process.
Example 4.20
Problem
A pharmaceutical company is testing a cholesterol-lowering medication in a clinical trial. The company claims that the medication helps to lower cholesterol levels. To test the claim, a group of 10 patients is selected for a clinical trial. Before and after cholesterol measurements are taken on the same patients, where the “after” measurement is recorded after the patient has taken the medication for a six-month time period as shown in Table 4.13.
Patient | Cholesterol Level Before Taking the Medication | Cholesterol Level After Taking the Medication | Difference (Before Reading - After Reading) |
---|---|---|---|
1 | 218 | 210 | 8 |
2 | 232 | 241 | -9 |
3 | 259 | 223 | 36 |
4 | 265 | 244 | 21 |
5 | 248 | 227 | 21 |
6 | 298 | 273 | 25 |
7 | 263 | 252 | 11 |
8 | 281 | 276 | 5 |
9 | 290 | 281 | 9 |
10 | 271 | 259 | 12 |
Solution
The first step in the hypothesis testing procedure is to write the null and alternative hypotheses based on the claim. Notice that the company claims that the medication helps to lower cholesterol levels. If the medication did reduce cholesterol levels, then we would expect the after data to be less than the before data. In this case, the difference for “before - after” would be a positive value, which is greater than zero. A greater than symbol must be used in the alternative hypothesis. The complement of a “greater than” symbol is a “less than or equals” symbol, which will be used in the null hypothesis. This setup will correspond to Setup C from Table 4.12. The claim corresponds to the alternative hypothesis. Thus the setup of the null and alternative hypotheses will be as follows:
The mean of the differences is average of the differences column shown in Table 4.13. The mean of the differences column is 13.9, and the standard deviation of the differences column is 12.414.
Next, calculate the standardized test statistic as follows:
Notice that since the alternative hypothesis contains a “greater than” symbol, the hypothesis test is called a “right tailed” test and the p-value will be calculated as the area under the t-distribution curve to the right of the test statistic. The area under the t-distribution curve to the right of a t-score of 3.541 (using degrees of freedom ) is approximately 0.003. Thus the p-value for this hypothesis test is 0.003.
The final step in the hypothesis testing procedure is to come to a final decision regarding the null hypothesis. This is accomplished by comparing the p-value with the level of significance. In this example, the p-value is 0.003 and the level of significance is 0.05: since the p-value level of significance, then the decision is to “reject the null hypothesis.”
Conclusion: The decision is to reject the null hypothesis; the claim corresponds to the alternative hypothesis. This scenario corresponds to Row 2 in Table 4.8, and “there is enough evidence to support the claim that the medication helps to lower cholesterol levels.”
Testing Claims for Two Proportions
Data scientists and researchers are often interested in comparing two proportions. For example, a medical researcher might be interested in comparing the percentage of people contracting a respiratory virus for those taking a vaccine versus not taking a vaccine.
In this section, we discuss two-sample hypothesis testing for proportions, where the notation refers to the population proportion for the first group and refers to the population proportion for the second group.
A researcher will take samples from the two populations and then calculate two sample proportions () as follows:
Where:
is the number of successes in the first sample.
is the sample size for the first sample.
is the number of successes in the second sample.
is the sample size for the second sample.
is the sample proportion for the first sample.
is the sample proportion for the second sample.
It is also useful to calculate a weighted estimate () for and , as follows:
Here are the requirements to use this procedure:
- The samples are random and independent.
- Sample sizes are large enough to use a normal distribution as an approximation for a binomial distribution. To confirm this, ensure that the following four quantities are all at least 5: , , ,
When these requirements are met, the normal distribution is used as the basis for conducting the hypothesis test.
For this procedure, the test statistic will be a z-score, and the standardized test statistic is calculated as follows:
Here is a step-by-step example to illustrate this hypothesis testing process.
Example 4.21
Problem
A drug is under investigation to address lung cancer. As part of a clinical trial, 250 people took the drug and 230 people took a placebo (a substance with no therapeutic effect ). From the results of the study, 211 people taking the drug were cancer free, and 172 taking the placebo were cancer free. At a level of significance of 0.01, can you support the claim by the pharmaceutical company that the proportion of subjects who are cancer free after taking the actual cancer drug is greater than the proportion for those who took the placebo?
Solution
The first step in the hypothesis testing procedure is to write the null and alternative hypotheses based on the claim. Notice that the claim mentions “proportion of subjects who are cancer free after taking the actual cancer drug is greater than the proportion for those who took the placebo,” where “greater than” will translate to a greater than symbol. A greater than symbol must be used in the alternative hypothesis. The complement of a “greater than” symbol is a “less than or equals” symbol, which will be used in the null hypothesis. Let represent the proportion of people who take the medication and are cancer free, and let represent the proportion of people who take the placebo and are cancer free. The null hypothesis assumes there is no difference in the two proportions such that is equal to (which can also be considered as ). This setup will correspond to Setup E from Table 4.10. The claim corresponds to the alternative hypothesis. Thus the setup of the null and alternative hypotheses will be as follows:
First, check that the requirements for this hypothesis test are met.
Calculate the weighted estimate () for and , as follows:
Check that the following four quantities are all at least 5: , , ,.
Since these results are at least 5, the requirements are met.
Calculate the sample proportions as follows:
Next, calculate the standardized test statistic as follows:
Notice that since the alternative hypothesis contains a “greater than” symbol, the hypothesis test is called a “right tailed” test and the p-value will be calculated as the area under the normal distribution curve to the right of the test statistic. The area under the normal distribution curve to the right of a z-score of 2.621 is approximately 0.004. Thus the p-value for this hypothesis test is 0.004.
The final step in the hypothesis testing procedure is to come to a final decision regarding the null hypothesis. This is accomplished by comparing the p-value with the level of significance. In this example, the p-value is 0.004 and the level of significance is 0.01: since the p-value level of significance, then the decision is to “reject the null hypothesis.”
Conclusion: The decision is to reject the null hypothesis; the claim corresponds to the alternative hypothesis. This scenario corresponds to row #2 in Table 4.8, and “there is enough evidence to support the claim” that the proportion of subjects who are cancer free after taking the actual cancer drug is greater than the proportion for those who took the placebo. This indicates that the medication is effective in addressing lung cancer.