Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo
Introductory Business Statistics 2e

10.6 Matched or Paired Samples

Introductory Business Statistics 2e10.6 Matched or Paired Samples

In most cases of economic or business data we have little or no control over the process of how the data are gathered. In this sense the data are not the result of a planned controlled experiment. In some cases, however, we can develop data that are part of a controlled experiment. This situation occurs frequently in quality control situations. Imagine that the production rates of two machines built to the same design, but at different manufacturing plants, are being tested for differences in some production metric such as speed of output or meeting some production specification such as strength of the product. The test is the same in format to what we have been testing, but here we can have matched pairs for which we can test if differences exist. Each observation has its matched pair against which differences are calculated. First, the differences in the metric to be tested between the two lists of observations must be calculated, and this is typically labeled with the letter "d." Then, the average of these matched differences, X d=(x1-x2)n X d=(x1-x2)n is calculated as is its standard deviation, Sd. We expect that the standard deviation of the differences of the matched pairs will be smaller than unmatched pairs because presumably fewer differences should exist because of the correlation between the two groups.

When using a hypothesis test for matched or paired samples, the following characteristics may be present:

  1. Simple random sampling is used.
  2. Sample sizes are often small.
  3. Two measurements (samples) are drawn from the same pair of individuals or objects.
  4. Differences are calculated from the matched or paired samples.
  5. The differences from the sample that is used for the hypothesis test.
  6. Either the matched pairs have differences that come from a population that is normal or the number of differences is sufficiently large so that distribution of the sample mean of differences is approximately normal, as has been the case using the Central Limit Theorem.

In a hypothesis test for matched or paired samples, subjects are matched in pairs and differences are calculated. The differences are the data. The population mean for the differences, μd, is then tested using a Student's-t test for a single population mean with n – 1 degrees of freedom, where n is the number of differences, that is, the number of pairs not the number of observations.

The null and alternative hypotheses for this test are:
H0 : µd=0 H0:µd=0
Ha : µd0Ha:µd0
The test statistic is:
tc= x d μ d ( s d n ) tc= x d μ d ( s d n )

Example 10.9

Problem

A company has developed a training program for its entering employees because management has become concerned with the results of the six-month employee review. They hope that the training program can result in better six-month reviews. Each trainee constitutes a “pair”, the entering score the employee received when first entering the firm and the score given at the six-month review. The difference in the two scores were calculated for each employee and the means for before and after the training program was calculated. The sample mean before the training program was 20.4 and the sample mean after the training program was 23.9. The standard deviation of the differences in the two scores across the 20 employees was 3.8 points. Test at the 10% significance level the null hypothesis that the two population means are equal against the alternative that the training program helps improve the employees’ scores.

Try It 10.9

Five ball players think they can throw the same distance with their dominant hand (throwing) and off-hand (catching hand). The data were collected and recorded in Table 10.10. Conduct a hypothesis test to determine whether the mean difference in distances between the dominant and off-hand is significant. Test at the 5% level.

Player 1 Player 2 Player 3 Player 4 Player 5
Dominant Hand 120 111 135 140 125
Off-hand 105 109 98 111 99
Table 10.10

Example 10.10

Problem

A study was conducted to investigate the effectiveness of hypnotism in reducing pain. Results for randomly selected subjects are shown in Table 10.11. A lower score indicates less pain. The "before" value is matched to an "after" value and the differences are calculated. Are the sensory measurements, on average, lower after hypnotism? Test at a 5% significance level.

Subject: A B C D E F G H
Before 6.6 6.5 9.0 10.3 11.3 8.1 6.3 11.6
After 6.8 2.4 7.4 8.5 8.1 6.1 3.4 2.0
Table 10.11

Try It 10.10

A study was conducted to investigate how effective a new diet was in lowering cholesterol. Results for the randomly selected subjects are shown in the table. The differences have a normal distribution. Are the subjects’ cholesterol levels lower on average after the diet? Test at the 5% level.

Subject A B C D E F G H I
Before 209 210 205 198 216 217 238 240 222
After 199 207 189 209 217 202 211 223 201
Table 10.13

Example 10.11

A college softball coach was interested in whether the college's strength development class increased their players' maximum lift (in pounds). Four players were asked to participate in the study. The amount of weight they could each lift was recorded before they took the strength development class. After completing the class, the amount of weight they could each lift was again measured. The data are as follows:

Weight (in pounds) Player 1 Player 2 Player 3 Player 4
Amount of weight lifted prior to the class 205 241 338 368
Amount of weight lifted after the class 295 252 330 360
Table 10.14

The coach wants to know if the strength development class makes the players stronger, on average.
Record the differences data. Calculate the differences by subtracting the amount of weight lifted prior to the class from the weight lifted after completing the class. The data for the differences are: {90, 11, -8, -8}.

x d x d = 21.3, sd = 46.7

Using the difference data, this becomes a test of a single mean.

Define the random variable: X d X d mean difference in the maximum lift per player.

The distribution for the hypothesis test is a student's t with 3 degrees of freedom.

H0: μd ≤ 0, Ha: μd > 0

Normal distribution curve with values of 0 and 21.3. A vertical upward line extends from 21.3 to the curve and the p-value is indicated in the area to the right of this value.
Figure 10.10

Calculate the test statistic look up the critical value: The calculated value of the test statistic is 0.91. The critical value of the student's t at 5% level of significance and 3 degrees of freedom is 2.353.

Decision: If the level of significance is 5%, we cannot reject the null hypothesis, because the calculated value of the test statistic is not in the tail.

What is the conclusion?

At a 5% level of significance, from the sample data, there is not sufficient evidence to conclude that the strength development class helped to make the players stronger, on average. Be sure to note in your conclusion that a sample size of only 4 leaves a degree of freedom of only 3 and sets the critical value very large. In short, sample size so small cannot result in always meaningful conclusions.

Try It 10.11

A new prep class was designed to improve SAT test scores. Four students were selected at random. Their scores on two practice exams were recorded, one before the class and one after. The data recorded in Table 10.15. Are the scores, on average, higher after the class? Test at a 5% level.

SAT Scores Student 1 Student 2 Student 3 Student 4
Score before class 1840 1960 1920 2150
Score after class 1920 2160 2200 2100
Table 10.15
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
Citation information

© Jul 18, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.