Skip to Content
OpenStax Logo
Introductory Statistics

11.3 Test of Independence

Introductory Statistics11.3 Test of Independence
Buy book
  1. Preface
  2. 1 Sampling and Data
    1. Introduction
    2. 1.1 Definitions of Statistics, Probability, and Key Terms
    3. 1.2 Data, Sampling, and Variation in Data and Sampling
    4. 1.3 Frequency, Frequency Tables, and Levels of Measurement
    5. 1.4 Experimental Design and Ethics
    6. 1.5 Data Collection Experiment
    7. 1.6 Sampling Experiment
    8. Key Terms
    9. Chapter Review
    10. Practice
    11. Homework
    12. Bringing It Together: Homework
    13. References
    14. Solutions
  3. 2 Descriptive Statistics
    1. Introduction
    2. 2.1 Stem-and-Leaf Graphs (Stemplots), Line Graphs, and Bar Graphs
    3. 2.2 Histograms, Frequency Polygons, and Time Series Graphs
    4. 2.3 Measures of the Location of the Data
    5. 2.4 Box Plots
    6. 2.5 Measures of the Center of the Data
    7. 2.6 Skewness and the Mean, Median, and Mode
    8. 2.7 Measures of the Spread of the Data
    9. 2.8 Descriptive Statistics
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  4. 3 Probability Topics
    1. Introduction
    2. 3.1 Terminology
    3. 3.2 Independent and Mutually Exclusive Events
    4. 3.3 Two Basic Rules of Probability
    5. 3.4 Contingency Tables
    6. 3.5 Tree and Venn Diagrams
    7. 3.6 Probability Topics
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Bringing It Together: Practice
    13. Homework
    14. Bringing It Together: Homework
    15. References
    16. Solutions
  5. 4 Discrete Random Variables
    1. Introduction
    2. 4.1 Probability Distribution Function (PDF) for a Discrete Random Variable
    3. 4.2 Mean or Expected Value and Standard Deviation
    4. 4.3 Binomial Distribution
    5. 4.4 Geometric Distribution
    6. 4.5 Hypergeometric Distribution
    7. 4.6 Poisson Distribution
    8. 4.7 Discrete Distribution (Playing Card Experiment)
    9. 4.8 Discrete Distribution (Lucky Dice Experiment)
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. References
    16. Solutions
  6. 5 Continuous Random Variables
    1. Introduction
    2. 5.1 Continuous Probability Functions
    3. 5.2 The Uniform Distribution
    4. 5.3 The Exponential Distribution
    5. 5.4 Continuous Distribution
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  7. 6 The Normal Distribution
    1. Introduction
    2. 6.1 The Standard Normal Distribution
    3. 6.2 Using the Normal Distribution
    4. 6.3 Normal Distribution (Lap Times)
    5. 6.4 Normal Distribution (Pinkie Length)
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  8. 7 The Central Limit Theorem
    1. Introduction
    2. 7.1 The Central Limit Theorem for Sample Means (Averages)
    3. 7.2 The Central Limit Theorem for Sums
    4. 7.3 Using the Central Limit Theorem
    5. 7.4 Central Limit Theorem (Pocket Change)
    6. 7.5 Central Limit Theorem (Cookie Recipes)
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. References
    13. Solutions
  9. 8 Confidence Intervals
    1. Introduction
    2. 8.1 A Single Population Mean using the Normal Distribution
    3. 8.2 A Single Population Mean using the Student t Distribution
    4. 8.3 A Population Proportion
    5. 8.4 Confidence Interval (Home Costs)
    6. 8.5 Confidence Interval (Place of Birth)
    7. 8.6 Confidence Interval (Women's Heights)
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. References
    14. Solutions
  10. 9 Hypothesis Testing with One Sample
    1. Introduction
    2. 9.1 Null and Alternative Hypotheses
    3. 9.2 Outcomes and the Type I and Type II Errors
    4. 9.3 Distribution Needed for Hypothesis Testing
    5. 9.4 Rare Events, the Sample, Decision and Conclusion
    6. 9.5 Additional Information and Full Hypothesis Test Examples
    7. 9.6 Hypothesis Testing of a Single Mean and Single Proportion
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. References
    14. Solutions
  11. 10 Hypothesis Testing with Two Samples
    1. Introduction
    2. 10.1 Two Population Means with Unknown Standard Deviations
    3. 10.2 Two Population Means with Known Standard Deviations
    4. 10.3 Comparing Two Independent Population Proportions
    5. 10.4 Matched or Paired Samples
    6. 10.5 Hypothesis Testing for Two Means and Two Proportions
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. Bringing It Together: Homework
    13. References
    14. Solutions
  12. 11 The Chi-Square Distribution
    1. Introduction
    2. 11.1 Facts About the Chi-Square Distribution
    3. 11.2 Goodness-of-Fit Test
    4. 11.3 Test of Independence
    5. 11.4 Test for Homogeneity
    6. 11.5 Comparison of the Chi-Square Tests
    7. 11.6 Test of a Single Variance
    8. 11.7 Lab 1: Chi-Square Goodness-of-Fit
    9. 11.8 Lab 2: Chi-Square Test of Independence
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  13. 12 Linear Regression and Correlation
    1. Introduction
    2. 12.1 Linear Equations
    3. 12.2 Scatter Plots
    4. 12.3 The Regression Equation
    5. 12.4 Testing the Significance of the Correlation Coefficient
    6. 12.5 Prediction
    7. 12.6 Outliers
    8. 12.7 Regression (Distance from School)
    9. 12.8 Regression (Textbook Cost)
    10. 12.9 Regression (Fuel Efficiency)
    11. Key Terms
    12. Chapter Review
    13. Formula Review
    14. Practice
    15. Homework
    16. Bringing It Together: Homework
    17. References
    18. Solutions
  14. 13 F Distribution and One-Way ANOVA
    1. Introduction
    2. 13.1 One-Way ANOVA
    3. 13.2 The F Distribution and the F-Ratio
    4. 13.3 Facts About the F Distribution
    5. 13.4 Test of Two Variances
    6. 13.5 Lab: One-Way ANOVA
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. References
    13. Solutions
  15. A | Review Exercises (Ch 3-13)
  16. B | Practice Tests (1-4) and Final Exams
  17. C | Data Sets
  18. D | Group and Partner Projects
  19. E | Solution Sheets
  20. F | Mathematical Phrases, Symbols, and Formulas
  21. G | Notes for the TI-83, 83+, 84, 84+ Calculators
  22. H | Tables
  23. Index

Tests of independence involve using a contingency table of observed (data) values.

The test statistic for a test of independence is similar to that of a goodness-of-fit test:

Σ (ij) (OE) 2 E Σ (ij) (OE) 2 E

where:

  • O = observed values
  • E = expected values
  • i = the number of rows in the table
  • j = the number of columns in the table

There are ijij terms of the form (OE) 2 E (OE) 2 E .

A test of independence determines whether two factors are independent or not. You first encountered the term independence in Probability Topics. As a review, consider the following example.

Note

The expected value for each cell needs to be at least five in order for you to use this test.

Example 11.5

Suppose A = a speeding violation in the last year and B = a cell phone user while driving. If A and B are independent then P(A AND B) = P(A)P(B). A AND B is the event that a driver received a speeding violation last year and also used a cell phone while driving. Suppose, in a study of drivers who received speeding violations in the last year, and who used cell phone while driving, that 755 people were surveyed. Out of the 755, 70 had a speeding violation and 685 did not; 305 used cell phones while driving and 450 did not.

Let y = expected number of drivers who used a cell phone while driving and received speeding violations.

If A and B are independent, then P(A AND B) = P(A)P(B). By substitution,

y 755 =( 70 755 )( 305 755 ) y 755 =( 70 755 )( 305 755 )

Solve for y: y = (70)(305) 755 =28.3 (70)(305) 755 =28.3

About 28 people from the sample are expected to use cell phones while driving and to receive speeding violations.

In a test of independence, we state the null and alternative hypotheses in words. Since the contingency table consists of two factors, the null hypothesis states that the factors are independent and the alternative hypothesis states that they are not independent (dependent). If we do a test of independence using the example, then the null hypothesis is:

H0: Being a cell phone user while driving and receiving a speeding violation are independent events.

If the null hypothesis were true, we would expect about 28 people to use cell phones while driving and to receive a speeding violation.

The test of independence is always right-tailed because of the calculation of the test statistic. If the expected and observed values are not close together, then the test statistic is very large and way out in the right tail of the chi-square curve, as it is in a goodness-of-fit.

The number of degrees of freedom for the test of independence is:

df = (number of columns - 1)(number of rows - 1)

The following formula calculates the expected number (E):

E=(row total)(column total)total number surveyedE=(row total)(column total)total number surveyed

Try It 11.5

A sample of 300 students is taken. Of the students surveyed, 50 were music students, while 250 were not. Ninety-seven were on the honor roll, while 203 were not. If we assume being a music student and being on the honor roll are independent events, what is the expected number of music students who are also on the honor roll?

Example 11.6

In a volunteer group, adults 21 and older volunteer from one to nine hours each week to spend time with a disabled senior citizen. The program recruits among community college students, four-year college students, and nonstudents. In Table 11.15 is a sample of the adult volunteers and the number of hours they volunteer per week.

Type of Volunteer 1–3 Hours 4–6 Hours 7–9 Hours Row Total
Community College Students 111 96 48 255
Four-Year College Students 96 133 61 290
Nonstudents 91 150 53 294
Column Total 298 379 162 839
Table 11.15 Number of Hours Worked Per Week by Volunteer Type (Observed) The table contains observed (O) values (data).

Is the number of hours volunteered independent of the type of volunteer?

Solution 11.6

The observed table and the question at the end of the problem, "Is the number of hours volunteered independent of the type of volunteer?" tell you this is a test of independence. The two factors are number of hours volunteered and type of volunteer. This test is always right-tailed.

H0: The number of hours volunteered is independent of the type of volunteer.

Ha: The number of hours volunteered is dependent on the type of volunteer.

The expected result are in Table 11.16.

The table contains expected (E) values (data).
Type of Volunteer 1-3 Hours 4-6 Hours 7-9 Hours
Community College Students90.57115.1949.24
Four-Year College Students103.00131.0056.00
Nonstudents104.42132.8156.77
Table 11.16 Number of Hours Worked Per Week by Volunteer Type (Expected)

For example, the calculation for the expected frequency for the top left cell is

E= (row total)(column total) total number surveyed = ( 255 )( 298 ) 839 =90.57 E= (row total)(column total) total number surveyed = ( 255 )( 298 ) 839 =90.57

Calculate the test statistic: χ2 = 12.99 (calculator or computer)

Distribution for the test:χ42χ42

df = (3 columns – 1)(3 rows – 1) = (2)(2) = 4

Graph:

Nonsymmetrical chi-square curve with values of 0 and 12.99 on the x-axis representing the test statistic of number of hours worked by volunteers of different types. A vertical upward line extends from 12.99 to the curve and the area to the right of this is equal to the p-value.
Figure 11.7

Probability statement: p-value=P(χ2 > 12.99) = 0.0113

Compare α and the p-value: Since no α is given, assume α = 0.05. p-value = 0.0113. α > p-value.

Make a decision: Since α > p-value, reject H0. This means that the factors are not independent.

Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that the number of hours volunteered and the type of volunteer are dependent on one another.

For the example in Table 11.16, if there had been another type of volunteer, teenagers, what would the degrees of freedom be?

Using the TI-83, 83+, 84, 84+ Calculator

Press the MATRX key and arrow over to EDIT. Press 1:[A]. Press 3 ENTER 3 ENTER. Enter the table values by row from Table 11.16. Press ENTER after each. Press 2nd QUIT. Press STAT and arrow over to TESTS. Arrow down to C:χ2-TEST. Press ENTER. You should see Observed:[A] and Expected:[B]. Arrow down to Calculate. Press ENTER. The test statistic is 12.9909 and the p-value = 0.0113. Do the procedure a second time, but arrow down to Draw instead of calculate.

Try It 11.6

The Bureau of Labor Statistics gathers data about employment in the United States. A sample is taken to calculate the number of U.S. citizens working in one of several industry sectors over time. Table 11.17 shows the results:

Industry Sector 2000 2010 2020 Total
Nonagriculture wage and salary 13,243 13,044 15,018 41,305
Goods-producing, excluding agriculture 2,457 1,771 1,950 6,178
Services-providing 10,786 11,273 13,068 35,127
Agriculture, forestry, fishing, and hunting 240 214 201 655
Nonagriculture self-employed and unpaid family worker 931 894 972 2,797
Secondary wage and salary jobs in agriculture and private household industries 14 11 11 36
Secondary jobs as a self-employed or unpaid family worker 196 144 152 492
Total 27,867 27,351 31,372 86,590
Table 11.17

We want to know if the change in the number of jobs is independent of the change in years. State the null and alternative hypotheses and the degrees of freedom.

Example 11.7

De Anza College is interested in the relationship between anxiety level and the need to succeed in school. A random sample of 400 students took a test that measured anxiety level and need to succeed in school. Table 11.18 shows the results. De Anza College wants to know if anxiety level and need to succeed in school are independent events.

Need to Succeed in School High
Anxiety
Med-high
Anxiety
Medium
Anxiety
Med-low
Anxiety
Low
Anxiety
Row Total
High Need 35 42 53 15 10 155
Medium Need 18 48 63 33 31 193
Low Need 4 5 11 15 17 52
Column Total 57 95 127 63 58 400
Table 11.18 Need to Succeed in School vs. Anxiety Level

a. How many high anxiety level students are expected to have a high need to succeed in school?

Solution 11.7

a. The column total for a high anxiety level is 57. The row total for high need to succeed in school is 155. The sample size or total surveyed is 400.

E=(row total)(column total)total surveyed=15557400=22.09E=(row total)(column total)total surveyed=15557400=22.09

The expected number of students who have a high anxiety level and a high need to succeed in school is about 22.

b. If the two variables are independent, how many students do you expect to have a low need to succeed in school and a med-low level of anxiety?

Solution 11.7

b. The column total for a med-low anxiety level is 63. The row total for a low need to succeed in school is 52. The sample size or total surveyed is 400.

c. E=(row total)(column total) total surveyedE=(row total)(column total) total surveyed = ________

Solution 11.7

c. E=(row total)(column total)total surveyed = 8.19E=(row total)(column total)total surveyed = 8.19

d. The expected number of students who have a med-low anxiety level and a low need to succeed in school is about ________.

Solution 11.7

d. 8

Try It 11.7

Refer back to the information in Try It. How many service providing jobs are there expected to be in 2020? How many nonagriculture wage and salary jobs are there expected to be in 2020?

Citation/Attribution

Want to cite, share, or modify this book? This book is Creative Commons Attribution License 4.0 and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction
Citation information

© Sep 19, 2013 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License 4.0 license. The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.