Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo
Introductory Business Statistics 2e

10.4 Comparing Two Independent Population Proportions

Introductory Business Statistics 2e10.4 Comparing Two Independent Population Proportions

Proportions have their base in the binomial probability distribution. As a probability distribution, we know the mean and standard deviation of the distribution. As a binary or categorical sample data set, we lose this knowledge from when we knew the population parameters. We do not know the population mean, µ = npµ = np, or variance, σ=npqσ=npq. We can gather data that we know comes from a binary distribution but not know the specific parameter. It is then that we have moved from probability to inferential statistics.

When conducting a hypothesis test that compares two independent population proportions, the following characteristics should be present:

  1. The two independent samples are random samples that are independent.
  2. The number of successes is at least five, and the number of failures is at least five, for each of the samples.
  3. Growing literature states that the population must be at least ten or even perhaps 20 times the size of the sample. This keeps each population from being over-sampled and causing biased results.

Comparing two proportions, like comparing two means, is common. If two estimated proportions are different, it may be due to a difference in the populations or it may be due to chance in the sampling. A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the two population proportions.

Like the case of differences in sample means, we construct a sampling distribution for differences in sample proportions: (pA'-pB')(pA'-pB') where p'A=XAnAp'A=XAnA and p'B=X'BnBp'B=X'BnB are the sample proportions for the two sets of data in question. XA and XB are the number of successes in each sample group respectively, and nA and nB are the respective sample sizes from the two groups. Again we go the Central Limit theorem to find the distribution of this sampling distribution for the differences in sample proportions. And again we find that this sampling distribution, like the ones past, are normally distributed as proved by the Central Limit Theorem, as seen in Figure 10.5 .

Figure 10.5

Generally, the null hypothesis allows for the test of a difference of a particular value, 𝛿0, just as we did for the case of differences in means.

H0 : p1 p2 = 𝛿0 H0:p1p2=𝛿0
H1 : p1 p2 𝛿0 H1:p1p2𝛿0

Most common, however, is the test that the two proportions are the same. That is,

H 0 : p A = p B H 0 : p A = p B
H a : p A p B H a : p A p B

To conduct the test, we use a pooled proportion, pc.

The pooled proportion is calculated as follows:
p c = x A + x B n A + n B p c = x A + x B n A + n B


The test statistic (z-score) is:
Zc = ( p A p B ) δ0 p c (1 p c )( 1 n A + 1 n B ) Zc= ( p A p B ) δ0 p c (1 p c )( 1 n A + 1 n B )

where δ0 is the hypothesized differences between the two proportions and pc is the pooled variance from the formula above.

Example 10.6

Problem

A bank has recently acquired a new branch and thus has customers in this new territory. They are interested in the default rate in their new territory. They wish to test the hypothesis that the default rate is different from their current customer base. They sample 200 files in area A, their current customers, and find that 20 have defaulted. In area B, the new customers, another sample of 200 files shows 12 have defaulted on their loans. At a 10% level of significance can we say that the default rates are the same or different?

Try It 10.6

Two types of valves are being tested to determine if there is a difference in pressure tolerances. Fifteen out of a random sample of 100 of Valve A cracked under 4,500 psi. Six out of a random sample of 100 of Valve B cracked under 4,500 psi. Test at a 5% level of significance.

Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
Citation information

© Jul 18, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.