Skip to Content
OpenStax Logo
Statistics

10.1 Two Population Means with Unknown Standard Deviations

Statistics10.1 Two Population Means with Unknown Standard Deviations
  1. Preface
  2. 1 Sampling and Data
    1. Introduction
    2. 1.1 Definitions of Statistics, Probability, and Key Terms
    3. 1.2 Data, Sampling, and Variation in Data and Sampling
    4. 1.3 Frequency, Frequency Tables, and Levels of Measurement
    5. 1.4 Experimental Design and Ethics
    6. 1.5 Data Collection Experiment
    7. 1.6 Sampling Experiment
    8. Key Terms
    9. Chapter Review
    10. Practice
    11. Homework
    12. Bringing It Together: Homework
    13. References
    14. Solutions
  3. 2 Descriptive Statistics
    1. Introduction
    2. 2.1 Stem-and-Leaf Graphs (Stemplots), Line Graphs, and Bar Graphs
    3. 2.2 Histograms, Frequency Polygons, and Time Series Graphs
    4. 2.3 Measures of the Location of the Data
    5. 2.4 Box Plots
    6. 2.5 Measures of the Center of the Data
    7. 2.6 Skewness and the Mean, Median, and Mode
    8. 2.7 Measures of the Spread of the Data
    9. 2.8 Descriptive Statistics
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  4. 3 Probability Topics
    1. Introduction
    2. 3.1 Terminology
    3. 3.2 Independent and Mutually Exclusive Events
    4. 3.3 Two Basic Rules of Probability
    5. 3.4 Contingency Tables
    6. 3.5 Tree and Venn Diagrams
    7. 3.6 Probability Topics
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Bringing It Together: Practice
    13. Homework
    14. Bringing It Together: Homework
    15. References
    16. Solutions
  5. 4 Discrete Random Variables
    1. Introduction
    2. 4.1 Probability Distribution Function (PDF) for a Discrete Random Variable
    3. 4.2 Mean or Expected Value and Standard Deviation
    4. 4.3 Binomial Distribution (Optional)
    5. 4.4 Geometric Distribution (Optional)
    6. 4.5 Hypergeometric Distribution (Optional)
    7. 4.6 Poisson Distribution (Optional)
    8. 4.7 Discrete Distribution (Playing Card Experiment)
    9. 4.8 Discrete Distribution (Lucky Dice Experiment)
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. References
    16. Solutions
  6. 5 Continuous Random Variables
    1. Introduction
    2. 5.1 Continuous Probability Functions
    3. 5.2 The Uniform Distribution
    4. 5.3 The Exponential Distribution (Optional)
    5. 5.4 Continuous Distribution
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  7. 6 The Normal Distribution
    1. Introduction
    2. 6.1 The Standard Normal Distribution
    3. 6.2 Using the Normal Distribution
    4. 6.3 Normal Distribution—Lap Times
    5. 6.4 Normal Distribution—Pinkie Length
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  8. 7 The Central Limit Theorem
    1. Introduction
    2. 7.1 The Central Limit Theorem for Sample Means (Averages)
    3. 7.2 The Central Limit Theorem for Sums (Optional)
    4. 7.3 Using the Central Limit Theorem
    5. 7.4 Central Limit Theorem (Pocket Change)
    6. 7.5 Central Limit Theorem (Cookie Recipes)
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. References
    13. Solutions
  9. 8 Confidence Intervals
    1. Introduction
    2. 8.1 A Single Population Mean Using the Normal Distribution
    3. 8.2 A Single Population Mean Using the Student's t-Distribution
    4. 8.3 A Population Proportion
    5. 8.4 Confidence Interval (Home Costs)
    6. 8.5 Confidence Interval (Place of Birth)
    7. 8.6 Confidence Interval (Women's Heights)
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. References
    14. Solutions
  10. 9 Hypothesis Testing with One Sample
    1. Introduction
    2. 9.1 Null and Alternative Hypotheses
    3. 9.2 Outcomes and the Type I and Type II Errors
    4. 9.3 Distribution Needed for Hypothesis Testing
    5. 9.4 Rare Events, the Sample, and the Decision and Conclusion
    6. 9.5 Additional Information and Full Hypothesis Test Examples
    7. 9.6 Hypothesis Testing of a Single Mean and Single Proportion
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. References
    14. Solutions
  11. 10 Hypothesis Testing with Two Samples
    1. Introduction
    2. 10.1 Two Population Means with Unknown Standard Deviations
    3. 10.2 Two Population Means with Known Standard Deviations
    4. 10.3 Comparing Two Independent Population Proportions
    5. 10.4 Matched or Paired Samples (Optional)
    6. 10.5 Hypothesis Testing for Two Means and Two Proportions
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. Bringing It Together: Homework
    13. References
    14. Solutions
  12. 11 The Chi-Square Distribution
    1. Introduction
    2. 11.1 Facts About the Chi-Square Distribution
    3. 11.2 Goodness-of-Fit Test
    4. 11.3 Test of Independence
    5. 11.4 Test for Homogeneity
    6. 11.5 Comparison of the Chi-Square Tests
    7. 11.6 Test of a Single Variance
    8. 11.7 Lab 1: Chi-Square Goodness-of-Fit
    9. 11.8 Lab 2: Chi-Square Test of Independence
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  13. 12 Linear Regression and Correlation
    1. Introduction
    2. 12.1 Linear Equations
    3. 12.2 The Regression Equation
    4. 12.3 Testing the Significance of the Correlation Coefficient (Optional)
    5. 12.4 Prediction (Optional)
    6. 12.5 Outliers
    7. 12.6 Regression (Distance from School) (Optional)
    8. 12.7 Regression (Textbook Cost) (Optional)
    9. 12.8 Regression (Fuel Efficiency) (Optional)
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  14. 13 F Distribution and One-way Anova
    1. Introduction
    2. 13.1 One-Way ANOVA
    3. 13.2 The F Distribution and the F Ratio
    4. 13.3 Facts About the F Distribution
    5. 13.4 Test of Two Variances
    6. 13.5 Lab: One-Way ANOVA
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. References
    13. Solutions
  15. A | Appendix A Review Exercises (Ch 3–13)
  16. B | Appendix B Practice Tests (1–4) and Final Exams
  17. C | Data Sets
  18. D | Group and Partner Projects
  19. E | Solution Sheets
  20. F | Mathematical Phrases, Symbols, and Formulas
  21. G | Notes for the TI-83, 83+, 84, 84+ Calculators
  22. H | Tables
  23. Index
  1. The two independent samples are simple random samples from two distinct populations.
  2. For the two distinct populations
    • if the sample sizes are small, the distributions are important (should be normal), and
    • if the sample sizes are large, the distributions are not important (need not be normal)
The test comparing two independent population means with unknown and possibly unequal population standard deviations is called the Aspin-Welch t-test. The degrees of freedom formula was developed by Aspin-Welch.

The comparison of two population means is very common. A difference between the two samples depends on both the means and the standard deviations. Very different means can occur by chance if there is great variation among the individual samples. To account for the variation, we take the difference of the sample means, X ¯ 1 X ¯ 2 X ¯ 1 X ¯ 2 , and divide by the standard error to standardize the difference. The result is a t-score test statistic.

Because we do not know the population standard deviations, we estimate them using the two sample standard deviations from our independent samples. For the hypothesis test, we calculate the estimated standard deviation, or standard error, of the difference in sample means, X ¯ 1 X ¯ 2 . X ¯ 1 X ¯ 2 .

The standard error is calculated as follows:

( s 1 ) 2 n 1 + ( s 2 ) 2 n 2 ( s 1 ) 2 n 1 + ( s 2 ) 2 n 2

The test statistic (t-score) is calculated as follows:

( x ¯ 1 x ¯ 2 )( μ 1 μ 2 ) ( s 1 ) 2 n 1 + ( s 2 ) 2 n 2 ( x ¯ 1 x ¯ 2 )( μ 1 μ 2 ) ( s 1 ) 2 n 1 + ( s 2 ) 2 n 2
where
  • s1 and s2, the sample standard deviations, are estimates of σ1 and σ2, respectively,
  • σ1 and σ1 are the unknown population standard deviations,
  • x ¯ 1 x ¯ 1 and x ¯ 2 x ¯ 2 are the sample means, and
  • μ1 and μ2 are the population means.

The number of degrees of freedom (df) requires a somewhat complicated calculation. However, a computer or calculator calculates it easily. The df are not always a whole number. The test statistic calculated previously is approximated by the Student’s t-distribution with df as follows:

Degrees of freedom

df= ( ( s 1 ) 2 n 1 + ( s 2 ) 2 n 2 ) 2 ( 1 n 1 1 ) ( ( s 1 ) 2 n 1 ) 2 +( 1 n 2 1 ) ( ( s 2 ) 2 n 2 ) 2 df= ( ( s 1 ) 2 n 1 + ( s 2 ) 2 n 2 ) 2 ( 1 n 1 1 ) ( ( s 1 ) 2 n 1 ) 2 +( 1 n 2 1 ) ( ( s 2 ) 2 n 2 ) 2

When both sample sizes n1 and n2 are five or larger, the Student’s t approximation is very good. Notice that the sample variances (s1)2 and (s2)2 are not pooled. (If the question comes up, do not pool the variances.)

It is not necessary to compute this by hand. A calculator or computer easily computes it.

Example 10.1 Independent groups

The average amount of time boys and girls aged 7 to 11 spend playing sports each day is believed to be the same. A study is done and data are collected, resulting in the data in Table 10.1. Each populations has a normal distribution.

Sample Size Average Number of Hours Playing Sports per Day Sample Standard Deviation
Girls 9 2 0.866
Boys 16 3.2 1.00
Table 10.1

Is there a difference in the mean amount of time boys and girls aged 7 to 11 play sports each day? Test at the 5 percent level of significance.

Try It 10.1

Two samples are shown in Table 10.2. Both have normal distributions. The means for the two populations are thought to be the same. Is there a difference in the means? Test at the 5 percent level of significance.

Sample Size Sample Mean Sample Standard Deviation
Population A 25 5 1
Population B 16 4.7 1.2
Table 10.2

NOTE

When the sum of the sample sizes is larger than 30 (n1 + n2 > 30), you can use the normal distribution to approximate the Student’s t.

Example 10.2

A study is done by a community group in two neighboring colleges to determine which one graduates students with more math classes. College A samples 11 graduates. Their average is 4 math classes with a standard deviation of 1.5 math classes. College B samples nine graduates. Their average is 3.5 math classes with a standard deviation of 1 math class. The community group believes that a student who graduates from College A has taken more math classes, on average. Both populations have a normal distribution. Test at a 1 percent significance level. Answer the following questions:

a. Is this a test of two means or two proportions?

b. Are the populations standard deviations known or unknown?

c. Which distribution do you use to perform the test?

d. What is the random variable?

e. What are the null and alternate hypotheses? Write the null and alternate hypotheses in symbols.

f. Is this test right-, left-, or two-tailed?

g. What is the p-value?

h. Do you reject or not reject the null hypothesis?

i. Conclusion:

Try It 10.2

A study is done to determine if Company A retains its workers longer than Company B. Company A samples 15 workers, and their average time with the company is 5 years with a standard deviation of 1.2. Company B samples 20 workers, and their average time with the company is 4.5 years with a standard deviation of 0.8. The populations are normally distributed.

  1. Are the population standard deviations known?
  2. Conduct an appropriate hypothesis test. At the 5 percent significance level, what is your conclusion?

Example 10.3

A professor at a large community college wanted to determine whether there is a difference in the means of final exam scores between students who took his statistics course online and the students who took his face-to-face statistics class. He believed that the mean of the final exam scores for the online class would be lower than that of the face-to-face class. Was the professor correct? The randomly selected 30 final exam scores from each group are listed in Table 10.3 and Table 10.4.

67.6 41.2 85.3 55.9 82.4 91.2 73.5 94.1 64.7 64.7
70.6 38.2 61.8 88.2 70.6 58.8 91.2 73.5 82.4 35.5
94.1 88.2 64.7 55.9 88.2 97.1 85.3 61.8 79.4 79.4
Table 10.3 Online Class
77.9 95.3 81.2 74.1 98.8 88.2 85.9 92.9 87.1 88.2
69.4 57.6 69.4 67.1 97.6 85.9 88.2 91.8 78.8 71.8
98.8 61.2 92.9 90.6 97.6 100 95.3 83.5 92.9 89.4
Table 10.4 Face-to-Face Class

Is the mean of the final exam scores of the online class lower than the mean of the final exam scores of the face-to-face class? Test at a 5 percent significance level. Answer the following questions:

  1. Is this a test of two means or two proportions?
  2. Are the population standard deviations known or unknown?
  3. Which distribution do you use to perform the test?
  4. What is the random variable?
  5. What are the null and alternative hypotheses? Write the null and alternative hypotheses in words and in symbols.
  6. Is this test right-, left-, or two-tailed?
  7. What is the p-value?
  8. Do you reject or not reject the null hypothesis?
  9. At the _____ level of significance, from the sample data, there ______ (is/is not) sufficient evidence to conclude that ______.

(See the conclusion in Example 10.2, and write yours in a similar fashion.)

Using the TI-83, 83+, 84, 84+ Calculator

First put the data for each group into two lists (such as L1 and L2). Press STAT. Arrow over to TESTS and press 4:2SampTTest. Make sure Data is highlighted and press ENTER. Arrow down and enter L1 for the first list and L2 for the second list. Arrow down to μ1: and arrow to μ2 (does not equal). Press ENTER. Arrow down to Pooled: No. Press ENTER. Arrow down to Calculate and press ENTER.

Note

Be careful not to mix up the information for Group 1 and Group 2!

Cohen’s Standards for Small, Medium, and Large Effect SizesCohen’s d is a measure of effect size based on the differences between two means. Cohen’s d, named for U.S. statistician Jacob Cohen, measures the relative strength of the differences between the means of two populations based on sample data. The calculated value of effect size is then compared to Cohen’s standards of small, medium, and large effect sizes.

Size of Effect d
Small 0.2
medium 0.5
Large 0.8
Table 10.5 Cohen’s Standard Effect Sizes

Cohen’s d is the measure of the difference between two means divided by the pooled standard deviation: d= x ¯ 1 x ¯ 2 s pooled d= x ¯ 1 x ¯ 2 s pooled where s pooled = ( n 1 1) s 1 2 +( n 2 1) s 2 2 n 1 + n 2 2 s pooled = ( n 1 1) s 1 2 +( n 2 1) s 2 2 n 1 + n 2 2 .

Example 10.4

Calculate Cohen’s d for Example 10.2. Is the size of the effect small, medium, or large? Explain what the size of the effect means for this problem.

Example 10.5

Calculate Cohen’s d for Example 10.3. Is the size of the effect small, medium, or large? Explain what the size of the effect means for this problem.

Try It 10.5

Weighted alpha is a measure of risk-adjusted performance of stocks over a period of a year. A high positive weighted alpha signifies a stock whose price has risen, while a small positive weighted alpha indicates an unchanged stock price during the time period. Weighted alpha is used to identify companies with strong upward or downward trends. The weighted alpha for the top 30 stocks of banks in the Northeast and in the West as identified by Nasdaq on May 24, 2013 are listed in Table 10.6 and Table 10.7, respectively.

94.2 75.2 69.6 52.0 48.0 41.9 36.4 33.4 31.5 27.6
77.3 71.9 67.5 50.6 46.2 38.4 35.2 33.0 28.7 26.5
76.3 71.7 56.3 48.7 43.2 37.6 33.7 31.8 28.5 26.0
Table 10.6 Northeast
126.0 70.6 65.2 51.4 45.5 37.0 33.0 29.6 23.7 22.6
116.1 70.6 58.2 51.2 43.2 36.0 31.4 28.7 23.5 21.6
78.2 68.2 55.6 50.3 39.0 34.1 31.0 25.3 23.4 21.5
Table 10.7 West

Is there a difference in the weighted alpha of the top 30 stocks of banks in the Northeast and in the West? Test at a 5 percent significance level. Answer the following questions:

  1. Is this a test of two means or two proportions?
  2. Are the population standard deviations known or unknown?
  3. Which distribution do you use to perform the test?
  4. What is the random variable?
  5. What are the null and alternative hypotheses? Write the null and alternative hypotheses in words and in symbols.
  6. Is this test right-, left-, or two-tailed?
  7. What is the p-value?
  8. Do you reject or not reject the null hypothesis?
  9. At the _____ level of significance, from the sample data, there ______ (is/is not) sufficient evidence to conclude that ______.
  10. Calculate Cohen’s d and interpret it.
Citation/Attribution

Want to cite, share, or modify this book? This book is Creative Commons Attribution License 4.0 and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/statistics/pages/1-introduction
Citation information

© Apr 3, 2020 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License 4.0 license. The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.