Skip to Content
OpenStax Logo
Introductory Business Statistics

8.3 A Confidence Interval for A Population Proportion

Introductory Business Statistics8.3 A Confidence Interval for A Population Proportion
Buy book
  1. Preface
  2. 1 Sampling and Data
    1. Introduction
    2. 1.1 Definitions of Statistics, Probability, and Key Terms
    3. 1.2 Data, Sampling, and Variation in Data and Sampling
    4. 1.3 Levels of Measurement
    5. 1.4 Experimental Design and Ethics
    6. Key Terms
    7. Chapter Review
    8. Homework
    9. References
    10. Solutions
  3. 2 Descriptive Statistics
    1. Introduction
    2. 2.1 Display Data
    3. 2.2 Measures of the Location of the Data
    4. 2.3 Measures of the Center of the Data
    5. 2.4 Sigma Notation and Calculating the Arithmetic Mean
    6. 2.5 Geometric Mean
    7. 2.6 Skewness and the Mean, Median, and Mode
    8. 2.7 Measures of the Spread of the Data
    9. Key Terms
    10. Chapter Review
    11. Formula Review
    12. Practice
    13. Homework
    14. Bringing It Together: Homework
    15. References
    16. Solutions
  4. 3 Probability Topics
    1. Introduction
    2. 3.1 Terminology
    3. 3.2 Independent and Mutually Exclusive Events
    4. 3.3 Two Basic Rules of Probability
    5. 3.4 Contingency Tables and Probability Trees
    6. 3.5 Venn Diagrams
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Bringing It Together: Practice
    12. Homework
    13. Bringing It Together: Homework
    14. References
    15. Solutions
  5. 4 Discrete Random Variables
    1. Introduction
    2. 4.1 Hypergeometric Distribution
    3. 4.2 Binomial Distribution
    4. 4.3 Geometric Distribution
    5. 4.4 Poisson Distribution
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  6. 5 Continuous Random Variables
    1. Introduction
    2. 5.1 Properties of Continuous Probability Density Functions
    3. 5.2 The Uniform Distribution
    4. 5.3 The Exponential Distribution
    5. Key Terms
    6. Chapter Review
    7. Formula Review
    8. Practice
    9. Homework
    10. References
    11. Solutions
  7. 6 The Normal Distribution
    1. Introduction
    2. 6.1 The Standard Normal Distribution
    3. 6.2 Using the Normal Distribution
    4. 6.3 Estimating the Binomial with the Normal Distribution
    5. Key Terms
    6. Chapter Review
    7. Formula Review
    8. Practice
    9. Homework
    10. References
    11. Solutions
  8. 7 The Central Limit Theorem
    1. Introduction
    2. 7.1 The Central Limit Theorem for Sample Means
    3. 7.2 Using the Central Limit Theorem
    4. 7.3 The Central Limit Theorem for Proportions
    5. 7.4 Finite Population Correction Factor
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  9. 8 Confidence Intervals
    1. Introduction
    2. 8.1 A Confidence Interval for a Population Standard Deviation, Known or Large Sample Size
    3. 8.2 A Confidence Interval for a Population Standard Deviation Unknown, Small Sample Case
    4. 8.3 A Confidence Interval for A Population Proportion
    5. 8.4 Calculating the Sample Size n: Continuous and Binary Random Variables
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  10. 9 Hypothesis Testing with One Sample
    1. Introduction
    2. 9.1 Null and Alternative Hypotheses
    3. 9.2 Outcomes and the Type I and Type II Errors
    4. 9.3 Distribution Needed for Hypothesis Testing
    5. 9.4 Full Hypothesis Test Examples
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  11. 10 Hypothesis Testing with Two Samples
    1. Introduction
    2. 10.1 Comparing Two Independent Population Means
    3. 10.2 Cohen's Standards for Small, Medium, and Large Effect Sizes
    4. 10.3 Test for Differences in Means: Assuming Equal Population Variances
    5. 10.4 Comparing Two Independent Population Proportions
    6. 10.5 Two Population Means with Known Standard Deviations
    7. 10.6 Matched or Paired Samples
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. Bringing It Together: Homework
    14. References
    15. Solutions
  12. 11 The Chi-Square Distribution
    1. Introduction
    2. 11.1 Facts About the Chi-Square Distribution
    3. 11.2 Test of a Single Variance
    4. 11.3 Goodness-of-Fit Test
    5. 11.4 Test of Independence
    6. 11.5 Test for Homogeneity
    7. 11.6 Comparison of the Chi-Square Tests
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. Bringing It Together: Homework
    14. References
    15. Solutions
  13. 12 F Distribution and One-Way ANOVA
    1. Introduction
    2. 12.1 Test of Two Variances
    3. 12.2 One-Way ANOVA
    4. 12.3 The F Distribution and the F-Ratio
    5. 12.4 Facts About the F Distribution
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  14. 13 Linear Regression and Correlation
    1. Introduction
    2. 13.1 The Correlation Coefficient r
    3. 13.2 Testing the Significance of the Correlation Coefficient
    4. 13.3 Linear Equations
    5. 13.4 The Regression Equation
    6. 13.5 Interpretation of Regression Coefficients: Elasticity and Logarithmic Transformation
    7. 13.6 Predicting with a Regression Equation
    8. 13.7 How to Use Microsoft Excel® for Regression Analysis
    9. Key Terms
    10. Chapter Review
    11. Practice
    12. Solutions
  15. A | Statistical Tables
  16. B | Mathematical Phrases, Symbols, and Formulas
  17. Index

During an election year, we see articles in the newspaper that state confidence intervals in terms of proportions or percentages. For example, a poll for a particular candidate running for president might show that the candidate has 40% of the vote within three percentage points (if the sample is large enough). Often, election polls are calculated with 95% confidence, so, the pollsters would be 95% confident that the true proportion of voters who favored the candidate would be between 0.37 and 0.43.

Investors in the stock market are interested in the true proportion of stocks that go up and down each week. Businesses that sell personal computers are interested in the proportion of households in the United States that own personal computers. Confidence intervals can be calculated for the true proportion of stocks that go up or down each week and for the true proportion of households in the United States that own personal computers.

The procedure to find the confidence interval for a population proportion is similar to that for the population mean, but the formulas are a bit different although conceptually identical. While the formulas are different, they are based upon the same mathematical foundation given to us by the Central Limit Theorem. Because of this we will see the same basic format using the same three pieces of information: the sample value of the parameter in question, the standard deviation of the relevant sampling distribution, and the number of standard deviations we need to have the confidence in our estimate that we desire.

How do you know you are dealing with a proportion problem? First, the underlying distribution has a binary random variable and therefore is a binomial distribution. (There is no mention of a mean or average.) If X is a binomial random variable, then X ~ B(n, p) where n is the number of trials and p is the probability of a success. To form a sample proportion, take X, the random variable for the number of successes and divide it by n, the number of trials (or the sample size). The random variable P′ (read "P prime") is the sample proportion,

P = X n P = X n

(Sometimes the random variable is denoted as P ^ P ^ , read "P hat".)

p′ = the estimated proportion of successes or sample proportion of successes (p′ is a point estimate for p, the true population proportion, and thus q is the probability of a failure in any one trial.)

x = the number of successes in the sample

n = the size of the sample

The formula for the confidence interval for a population proportion follows the same format as that for an estimate of a population mean. Remembering the sampling distribution for the proportion from Chapter 7, the standard deviation was found to be:

σp'=p(1p)nσp'=p(1p)n

The confidence interval for a population proportion, therefore, becomes:

p=p±[Z(a2)p(1p)n]p=p±[Z(a2)p(1p)n]

Z(a2)Z(a2) is set according to our desired degree of confidence and p(1p)np(1p)n is the standard deviation of the sampling distribution.

The sample proportions p′ and q′ are estimates of the unknown population proportions p and q. The estimated proportions p′ and q′ are used because p and q are not known.

Remember that as p moves further from 0.5 the binomial distribution becomes less symmetrical. Because we are estimating the binomial with the symmetrical normal distribution the further away from symmetrical the binomial becomes the less confidence we have in the estimate.

This conclusion can be demonstrated through the following analysis. Proportions are based upon the binomial probability distribution. The possible outcomes are binary, either “success” or “failure”. This gives rise to a proportion, meaning the percentage of the outcomes that are “successes”. It was shown that the binomial distribution could be fully understood if we knew only the probability of a success in any one trial, called p. The mean and the standard deviation of the binomial were found to be:

μ = np μ = np
σ = np q σ= np q

It was also shown that the binomial could be estimated by the normal distribution if BOTH np AND nq were greater than 5. From the discussion above, it was found that the standardizing formula for the binomial distribution is:

Z = p' - p ( p q n ) Z= p' - p ( p q n )

which is nothing more than a restatement of the general standardizing formula with appropriate substitutions for μ and σ from the binomial. We can use the standard normal distribution, the reason Z is in the equation, because the normal distribution is the limiting distribution of the binomial. This is another example of the Central Limit Theorem. We have already seen that the sampling distribution of means is normally distributed. Recall the extended discussion in Chapter 7 concerning the sampling distribution of proportions and the conclusions of the Central Limit Theorem.

We can now manipulate this formula in just the same way we did for finding the confidence intervals for a mean, but to find the confidence interval for the binomial population parameter, p.

p' - Zα p'q' n p p' + Zα p'q' n p'-Zαp'q' n pp'+Zαp'q' n

Where p′ = x/n, the point estimate of p taken from the sample. Notice that p′ has replaced p in the formula. This is because we do not know p, indeed, this is just what we are trying to estimate.

Unfortunately, there is no correction factor for cases where the sample size is small so np′ and nq' must always be greater than 5 to develop an interval estimate for p.

Example 8.6

Suppose that a market research firm is hired to estimate the percent of adults living in a large city who have cell phones. Five hundred randomly selected adult residents in this city are surveyed to determine whether they have cell phones. Of the 500 people sampled, 421 responded yes - they own cell phones. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of adult residents of this city who have cell phones.

Solution 8.6
  • The solution step-by-step.

Let X = the number of people in the sample who have cell phones. X is binomial: the random variable is binary, people either have a cell phone or they do not.

To calculate the confidence interval, we must find p′, q′.

n = 500

x = the number of successes in the sample = 421

p = x n = 421 500 =0.842 p = x n = 421 500 =0.842

p′ = 0.842 is the sample proportion; this is the point estimate of the population proportion.

q′ = 1 – p′ = 1 – 0.842 = 0.158

Since the requested confidence level is CL = 0.95, then α = 1 – CL = 1 – 0.95 = 0.05 ( α 2 ) ( α 2 ) = 0.025.

Then z α 2 = z 0.025 = 1.96 z α 2 = z 0.025 =1.96

This can be found using the Standard Normal probability table in Table 13.6. This can also be found in the students t table at the 0.025 column and infinity degrees of freedom because at infinite degrees of freedom the students t distribution becomes the standard normal distribution, Z.

The confidence interval for the true binomial population proportion is

p' - Z α p'q' n p p' + Z α p'q' n p'- Z α p'q' n pp'+ Z α p'q' n
Substituting in the values from above we find the confidence interval is : 0.810p0.874Substituting in the values from above we find the confidence interval is : 0.810p0.874

InterpretationWe estimate with 95% confidence that between 81% and 87.4% of all adult residents of this city have cell phones.

Explanation of 95% Confidence LevelNinety-five percent of the confidence intervals constructed in this way would contain the true value for the population proportion of all adult residents of this city who have cell phones.

Try It 8.6

Suppose 250 randomly selected people are surveyed to determine if they own a tablet. Of the 250 surveyed, 98 reported owning a tablet. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of people who own tablets.

Example 8.7

The Dundee Dog Training School has a larger than average proportion of clients who compete in competitive professional events. A confidence interval for the population proportion of dogs that compete in professional events from 150 different training schools is constructed. The lower limit is determined to be 0.08 and the upper limit is determined to be 0.16. Determine the level of confidence used to construct the interval of the population proportion of dogs that compete in professional events.

Solution 8.7

We begin with the formula for a confidence interval for a proportion because the random variable is binary; either the client competes in professional competitive dog events or they don't.

p=p±[Z(a2)p(1p)n]p=p±[Z(a2)p(1p)n]

Next we find the sample proportion:

p=0.08+0.162=0.12p=0.08+0.162=0.12

The ± that makes up the confidence interval is thus 0.04; 0.12 + 0.04 = 0.16 and 0.12 − 0.04 = 0.08, the boundaries of the confidence interval. Finally, we solve for Z.

[Z0.12(10.12)150]=0.04[Z0.12(10.12)150]=0.04, therefore Z = 1.51

And then look up the probability for 1.51 standard deviations on the standard normal table.

p(Z=1.51)=0.4345p(Z=1.51)=0.4345, p(Z)2=0.8690p(Z)2=0.8690 or 86.90%86.90%.

Example 8.8

A financial officer for a company wants to estimate the percent of accounts receivable that are more than 30 days overdue. He surveys 500 accounts and finds that 300 are more than 30 days overdue. Compute a 90% confidence interval for the true percent of accounts receivable that are more than 30 days overdue, and interpret the confidence interval.

Solution 8.8
  • The solution is step-by-step:

x = 300 and n = 500

p = x n = 300 500 =0.600 p = x n = 300 500 =0.600

q =1- p =1-0.600=0.400 q =1- p =1-0.600=0.400

Since confidence level = 0.90, then α = 1 – confidence level = (1 – 0.90) = 0.10 ( α 2 ) ( α 2 ) = 0.05

Z α 2 Z α 2 = Z0.05 = 1.645

This Z-value can be found using a standard normal probability table. The student's t-table can also be used by entering the table at the 0.05 column and reading at the line for infinite degrees of freedom. The t-distribution is the normal distribution at infinite degrees of freedom. This is a handy trick to remember in finding Z-values for commonly used levels of confidence. We use this formula for a confidence interval for a proportion:

p' - Zα p'q' n p p' + Zα p'q' n p'-Zαp'q' n pp'+Zαp'q' n

Substituting in the values from above we find the confidence interval for the true binomial population proportion is 0.564 ≤ p ≤ 0.636

Interpretation
  • We estimate with 90% confidence that the true percent of all accounts receivable overdue 30 days is between 56.4% and 63.6%.
  • Alternate Wording: We estimate with 90% confidence that between 56.4% and 63.6% of ALL accounts are overdue 30 days.

Explanation of 90% Confidence LevelNinety percent of all confidence intervals constructed in this way contain the true value for the population percent of accounts receivable that are overdue 30 days.

Try It 8.8

A student polls his school to see if students in the school district are for or against the new legislation regarding school uniforms. She surveys 600 students and finds that 480 are against the new legislation.

a. Compute a 90% confidence interval for the true percent of students who are against the new legislation, and interpret the confidence interval.

b. In a sample of 300 students, 68% said they own an iPod and a smart phone. Compute a 97% confidence interval for the true percent of students who own an iPod and a smartphone.

Citation/Attribution

Want to cite, share, or modify this book? This book is Creative Commons Attribution License 4.0 and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/introductory-business-statistics/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/introductory-business-statistics/pages/1-introduction
Citation information

© Nov 29, 2017 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License 4.0 license. The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.