Skip to Content
OpenStax Logo
Introductory Statistics

8.3 A Population Proportion

Introductory Statistics8.3 A Population Proportion
Buy book
  1. Preface
  2. 1 Sampling and Data
    1. Introduction
    2. 1.1 Definitions of Statistics, Probability, and Key Terms
    3. 1.2 Data, Sampling, and Variation in Data and Sampling
    4. 1.3 Frequency, Frequency Tables, and Levels of Measurement
    5. 1.4 Experimental Design and Ethics
    6. 1.5 Data Collection Experiment
    7. 1.6 Sampling Experiment
    8. Key Terms
    9. Chapter Review
    10. Practice
    11. Homework
    12. Bringing It Together: Homework
    13. References
    14. Solutions
  3. 2 Descriptive Statistics
    1. Introduction
    2. 2.1 Stem-and-Leaf Graphs (Stemplots), Line Graphs, and Bar Graphs
    3. 2.2 Histograms, Frequency Polygons, and Time Series Graphs
    4. 2.3 Measures of the Location of the Data
    5. 2.4 Box Plots
    6. 2.5 Measures of the Center of the Data
    7. 2.6 Skewness and the Mean, Median, and Mode
    8. 2.7 Measures of the Spread of the Data
    9. 2.8 Descriptive Statistics
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  4. 3 Probability Topics
    1. Introduction
    2. 3.1 Terminology
    3. 3.2 Independent and Mutually Exclusive Events
    4. 3.3 Two Basic Rules of Probability
    5. 3.4 Contingency Tables
    6. 3.5 Tree and Venn Diagrams
    7. 3.6 Probability Topics
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Bringing It Together: Practice
    13. Homework
    14. Bringing It Together: Homework
    15. References
    16. Solutions
  5. 4 Discrete Random Variables
    1. Introduction
    2. 4.1 Probability Distribution Function (PDF) for a Discrete Random Variable
    3. 4.2 Mean or Expected Value and Standard Deviation
    4. 4.3 Binomial Distribution
    5. 4.4 Geometric Distribution
    6. 4.5 Hypergeometric Distribution
    7. 4.6 Poisson Distribution
    8. 4.7 Discrete Distribution (Playing Card Experiment)
    9. 4.8 Discrete Distribution (Lucky Dice Experiment)
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. References
    16. Solutions
  6. 5 Continuous Random Variables
    1. Introduction
    2. 5.1 Continuous Probability Functions
    3. 5.2 The Uniform Distribution
    4. 5.3 The Exponential Distribution
    5. 5.4 Continuous Distribution
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  7. 6 The Normal Distribution
    1. Introduction
    2. 6.1 The Standard Normal Distribution
    3. 6.2 Using the Normal Distribution
    4. 6.3 Normal Distribution (Lap Times)
    5. 6.4 Normal Distribution (Pinkie Length)
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  8. 7 The Central Limit Theorem
    1. Introduction
    2. 7.1 The Central Limit Theorem for Sample Means (Averages)
    3. 7.2 The Central Limit Theorem for Sums
    4. 7.3 Using the Central Limit Theorem
    5. 7.4 Central Limit Theorem (Pocket Change)
    6. 7.5 Central Limit Theorem (Cookie Recipes)
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. References
    13. Solutions
  9. 8 Confidence Intervals
    1. Introduction
    2. 8.1 A Single Population Mean using the Normal Distribution
    3. 8.2 A Single Population Mean using the Student t Distribution
    4. 8.3 A Population Proportion
    5. 8.4 Confidence Interval (Home Costs)
    6. 8.5 Confidence Interval (Place of Birth)
    7. 8.6 Confidence Interval (Women's Heights)
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. References
    14. Solutions
  10. 9 Hypothesis Testing with One Sample
    1. Introduction
    2. 9.1 Null and Alternative Hypotheses
    3. 9.2 Outcomes and the Type I and Type II Errors
    4. 9.3 Distribution Needed for Hypothesis Testing
    5. 9.4 Rare Events, the Sample, Decision and Conclusion
    6. 9.5 Additional Information and Full Hypothesis Test Examples
    7. 9.6 Hypothesis Testing of a Single Mean and Single Proportion
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. References
    14. Solutions
  11. 10 Hypothesis Testing with Two Samples
    1. Introduction
    2. 10.1 Two Population Means with Unknown Standard Deviations
    3. 10.2 Two Population Means with Known Standard Deviations
    4. 10.3 Comparing Two Independent Population Proportions
    5. 10.4 Matched or Paired Samples
    6. 10.5 Hypothesis Testing for Two Means and Two Proportions
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. Bringing It Together: Homework
    13. References
    14. Solutions
  12. 11 The Chi-Square Distribution
    1. Introduction
    2. 11.1 Facts About the Chi-Square Distribution
    3. 11.2 Goodness-of-Fit Test
    4. 11.3 Test of Independence
    5. 11.4 Test for Homogeneity
    6. 11.5 Comparison of the Chi-Square Tests
    7. 11.6 Test of a Single Variance
    8. 11.7 Lab 1: Chi-Square Goodness-of-Fit
    9. 11.8 Lab 2: Chi-Square Test of Independence
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  13. 12 Linear Regression and Correlation
    1. Introduction
    2. 12.1 Linear Equations
    3. 12.2 Scatter Plots
    4. 12.3 The Regression Equation
    5. 12.4 Testing the Significance of the Correlation Coefficient
    6. 12.5 Prediction
    7. 12.6 Outliers
    8. 12.7 Regression (Distance from School)
    9. 12.8 Regression (Textbook Cost)
    10. 12.9 Regression (Fuel Efficiency)
    11. Key Terms
    12. Chapter Review
    13. Formula Review
    14. Practice
    15. Homework
    16. Bringing It Together: Homework
    17. References
    18. Solutions
  14. 13 F Distribution and One-Way ANOVA
    1. Introduction
    2. 13.1 One-Way ANOVA
    3. 13.2 The F Distribution and the F-Ratio
    4. 13.3 Facts About the F Distribution
    5. 13.4 Test of Two Variances
    6. 13.5 Lab: One-Way ANOVA
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. References
    13. Solutions
  15. A | Review Exercises (Ch 3-13)
  16. B | Practice Tests (1-4) and Final Exams
  17. C | Data Sets
  18. D | Group and Partner Projects
  19. E | Solution Sheets
  20. F | Mathematical Phrases, Symbols, and Formulas
  21. G | Notes for the TI-83, 83+, 84, 84+ Calculators
  22. H | Tables
  23. Index

During an election year, we see articles in the newspaper that state confidence intervals in terms of proportions or percentages. For example, a poll for a particular candidate running for president might show that the candidate has 40% of the vote within three percentage points (if the sample is large enough). Often, election polls are calculated with 95% confidence, so, the pollsters would be 95% confident that the true proportion of voters who favored the candidate would be between 0.37 and 0.43: (0.40 – 0.03,0.40 + 0.03).

Investors in the stock market are interested in the true proportion of stocks that go up and down each week. Businesses that sell personal computers are interested in the proportion of households in the United States that own personal computers. Confidence intervals can be calculated for the true proportion of stocks that go up or down each week and for the true proportion of households in the United States that own personal computers.

The procedure to find the confidence interval, the sample size, the error bound, and the confidence level for a proportion is similar to that for the population mean, but the formulas are different.

How do you know you are dealing with a proportion problem? First, the underlying distribution is a binomial distribution. (There is no mention of a mean or average.) If X is a binomial random variable, then X ~ B(n, p) where n is the number of trials and p is the probability of a success. To form a proportion, take X, the random variable for the number of successes and divide it by n, the number of trials (or the sample size). The random variable P′ (read "P prime") is that proportion,

P = X n P = X n

(Sometimes the random variable is denoted as P ^ P ^ , read "P hat".)

When n is large and p is not close to zero or one, we can use the normal distribution to approximate the binomial.

X~N(np, npq ) X~N(np, npq )

If we divide the random variable, the mean, and the standard deviation by n, we get a normal distribution of proportions with P′, called the estimated proportion, as the random variable. (Recall that a proportion as the number of successes divided by n.)

X n = P N( np n , npq n ) X n = P N( np n , npq n )

Using algebra to simplify : npq n = pq n npq n = pq n

P′ follows a normal distribution for proportions: X n = P N( np n , npq n ) X n = P N( np n , npq n )

The confidence interval has the form (p′EBP, p′ + EBP). EBP is error bound for the proportion.

p′ = x n x n

p′ = the estimated proportion of successes (p′ is a point estimate for p, the true proportion.)

x = the number of successes

n = the size of the sample

The error bound for a proportion is

EBP=( z α 2 )( p q n ) EBP=( z α 2 )( p q n ) where q′ = 1 – p′

This formula is similar to the error bound formula for a mean, except that the "appropriate standard deviation" is different. For a mean, when the population standard deviation is known, the appropriate standard deviation that we use is σ n σ n . For a proportion, the appropriate standard deviation is p q n p q n .

However, in the error bound formula, we use p q n p q n as the standard deviation, instead of p q n p q n .

In the error bound formula, the sample proportions p′ and q′ are estimates of the unknown population proportions p and q. The estimated proportions p′ and q′ are used because p and q are not known. The sample proportions p′ and q′ are calculated from the data: p′ is the estimated proportion of successes, and q′ is the estimated proportion of failures.

The confidence interval can be used only if the number of successes np′ and the number of failures nq′ are both greater than five.

Note

For the normal distribution of proportions, the z-score formula is as follows.

If P ~N( p, pq n ) P ~N( p, pq n ) then the z-score formula is z= p p pq n z= p p pq n

Example 8.10

Suppose that a market research firm is hired to estimate the percent of adults living in a large city who have cell phones. Five hundred randomly selected adult residents in this city are surveyed to determine whether they have cell phones. Of the 500 people surveyed, 421 responded yes - they own cell phones. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of adult residents of this city who have cell phones.

Solution 8.10
Solution A
  • The first solution is step-by-step (Solution A).
  • The second solution uses a function of the TI-83, 83+ or 84 calculators (Solution B).

Let X = the number of people in the sample who have cell phones. X is binomial. X~B( 500, 421 500 ) X~B( 500, 421 500 ) .

To calculate the confidence interval, you must find p′, q′, and EBP.

n = 500

x = the number of successes = 421

p = x n = 421 500 =0.842 p = x n = 421 500 =0.842

p′ = 0.842 is the sample proportion; this is the point estimate of the population proportion.

q′ = 1 – p′ = 1 – 0.842 = 0.158

Since CL = 0.95, then α = 1 – CL = 1 – 0.95 = 0.05 ( α 2 ) ( α 2 ) = 0.025.

Then z α 2 = z 0.025 = 1.96 z α 2 = z 0.025 =1.96

Use the TI-83, 83+, or 84+ calculator command invNorm(0.975,0,1) to find z0.025. Remember that the area to the right of z0.025 is 0.025 and the area to the left of z0.025 is 0.975. This can also be found using appropriate commands on other calculators, using a computer, or using a Standard Normal probability table.

EBP=( z α 2 ) p q n =(1.96) (0.842)(0.158) 500 =0.032 EBP=( z α 2 ) p q n =(1.96) (0.842)(0.158) 500 =0.032

p ' EBP = 0.842 0.032 = 0.81 p'EBP=0.8420.032=0.81

p +EBP=0.842+0.032=0.874 p +EBP=0.842+0.032=0.874

The confidence interval for the true binomial population proportion is (p′EBP, p′ + EBP) = (0.810, 0.874).

InterpretationWe estimate with 95% confidence that between 81% and 87.4% of all adult residents of this city have cell phones.

Explanation of 95% Confidence LevelNinety-five percent of the confidence intervals constructed in this way would contain the true value for the population proportion of all adult residents of this city who have cell phones.

Solution 8.10

Solution B

Using the TI-83, 83+, 84, 84+ Calculator

Press STAT and arrow over to TESTS.
Arrow down to A:1-PropZint. Press ENTER.
Arrow down to xx and enter 421.
Arrow down to nn and enter 500.
Arrow down to C-Level and enter .95.
Arrow down to Calculate and press ENTER.
The confidence interval is (0.81003, 0.87397).

Try It 8.10

Suppose 250 randomly selected people are surveyed to determine if they own a tablet. Of the 250 surveyed, 98 reported owning a tablet. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of people who own tablets.

Example 8.11

For a class project, a political science student at a large university wants to estimate the percent of students who are registered voters. He surveys 500 students and finds that 300 are registered voters. Compute a 90% confidence interval for the true percent of students who are registered voters, and interpret the confidence interval.

Solution 8.11
  • The first solution is step-by-step (Solution A).
  • The second solution uses a function of the TI-83, 83+, or 84 calculators (Solution B).

Solution A x = 300 and n = 500

p = x n = 300 500 =0.600 p = x n = 300 500 =0.600

q =1- p =1-0.600=0.400 q =1- p =1-0.600=0.400

Since CL = 0.90, then α = 1 – CL = 1 – 0.90 = 0.10 ( α 2 ) ( α 2 ) = 0.05

z α 2 z α 2 = z0.05 = 1.645

Use the TI-83, 83+, or 84+ calculator command invNorm(0.95,0,1) to find z0.05. Remember that the area to the right of z0.05 is 0.05 and the area to the left of z0.05 is 0.95. This can also be found using appropriate commands on other calculators, using a computer, or using a standard normal probability table.

EBP=( z α 2 ) p q n =(1.645) (0.60)(0.40) 500 =0.036 EBP=( z α 2 ) p q n =(1.645) (0.60)(0.40) 500 =0.036

p EBP=0.600.036=0.564 p EBP=0.600.036=0.564

p +EBP=0.60+0.036=0.636 p +EBP=0.60+0.036=0.636

The confidence interval for the true binomial population proportion is (p′EBP, p′ + EBP) = (0.564,0.636).

Interpretation
  • We estimate with 90% confidence that the true percent of all students that are registered voters is between 56.4% and 63.6%.
  • Alternate Wording: We estimate with 90% confidence that between 56.4% and 63.6% of ALL students are registered voters.

Explanation of 90% Confidence LevelNinety percent of all confidence intervals constructed in this way contain the true value for the population percent of students that are registered voters.

Solution 8.11

Solution B

Using the TI-83, 83+, 84, 84+ Calculator

Press STAT and arrow over to TESTS.
Arrow down to A:1-PropZint. Press ENTER.
Arrow down to xx and enter 300.
Arrow down to nn and enter 500.
Arrow down to C-Level and enter 0.90.
Arrow down to Calculate and press ENTER.
The confidence interval is (0.564, 0.636).

Try It 8.11

A student polls his school to see if students in the school district are for or against the new legislation regarding school uniforms. She surveys 600 students and finds that 480 are against the new legislation.

a. Compute a 90% confidence interval for the true percent of students who are against the new legislation, and interpret the confidence interval.

b. In a sample of 300 students, 68% said they own an iPod and a smart phone. Compute a 97% confidence interval for the true percent of students who own an iPod and a smartphone.

“Plus Four” Confidence Interval for p

There is a certain amount of error introduced into the process of calculating a confidence interval for a proportion. Because we do not know the true proportion for the population, we are forced to use point estimates to calculate the appropriate standard deviation of the sampling distribution. Studies have shown that the resulting estimation of the standard deviation can be flawed.

Fortunately, there is a simple adjustment that allows us to produce more accurate confidence intervals. We simply pretend that we have four additional observations. Two of these observations are successes and two are failures. The new sample size, then, is n + 4, and the new count of successes is x + 2.

Computer studies have demonstrated the effectiveness of this method. It should be used when the confidence level desired is at least 90% and the sample size is at least ten.

Example 8.12

A random sample of 25 statistics students was asked: “Have you smoked a cigarette in the past week?” Six students reported smoking within the past week. Use the “plus-four” method to find a 95% confidence interval for the true proportion of statistics students who smoke.

Solution 8.12

Solution ASix students out of 25 reported smoking within the past week, so x = 6 and n = 25. Because we are using the “plus-four” method, we will use x = 6 + 2 = 8 and n = 25 + 4 = 29.

p = x n = 8 29 0.276 p = x n = 8 29 0.276

q =1 p =10.276=0.724 q =1 p =10.276=0.724

Since CL = 0.95, we know α = 1 – 0.95 = 0.05 and α 2 α 2 = 0.025.

z 0.025 =1.96 z 0.025 =1.96

EPB=( z α 2 ) p q n =(1.96) 0.276(0.724) 29 0.163 EPB=( z α 2 ) p q n =(1.96) 0.276(0.724) 29 0.163

p′EPB = 0.276 – 0.163 = 0.113

p′ + EPB = 0.276 + 0.163 = 0.439

We are 95% confident that the true proportion of all statistics students who smoke cigarettes is between 0.113 and 0.439.

Solution 8.12

Solution B

Using the TI-83, 83+, 84, 84+ Calculator

Press STAT and arrow over to TESTS.
Arrow down to A:1-PropZint. Press ENTER.

Reminder

Remember that the plus-four method assume an additional four trials: two successes and two failures. You do not need to change the process for calculating the confidence interval; simply update the values of x and n to reflect these additional trials.

Arrow down to x and enter eight.
Arrow down to n and enter 29.
Arrow down to C-Level and enter 0.95.
Arrow down to Calculate and press ENTER.
The confidence interval is (0.113, 0.439).

Try It 8.12

Out of a random sample of 65 freshmen at State University, 31 students have declared a major. Use the “plus-four” method to find a 96% confidence interval for the true proportion of freshmen at State University who have declared a major.

Example 8.13

The Berkman Center for Internet & Society at Harvard recently conducted a study analyzing the privacy management habits of teen internet users. In a group of 50 teens, 13 reported having more than 500 friends on Facebook. Use the “plus four” method to find a 90% confidence interval for the true proportion of teens who would report having more than 500 Facebook friends.

Solution 8.13

Solution AUsing “plus-four,” we have x = 13 + 2 = 15 and n = 50 + 4 = 54.

p ' = 15 54 0.278 p ' = 15 54 0.278

q ' =1 p ' =1-0.241=0.722 q ' =1 p ' =1-0.241=0.722

Since CL = 0.90, we know α = 1 – 0.90 = 0.10 and α 2 α 2 = 0.05.

z 0.05 =1.645 z 0.05 =1.645

EPB=( z α 2 )( p q n )=(1.645)( (0.278)(0.722) 54 )0.100 EPB=( z α 2 )( p q n )=(1.645)( (0.278)(0.722) 54 )0.100

p′EPB = 0.278 – 0.100 = 0.178

p′ + EPB = 0.278 + 0.100 = 0.378

We are 90% confident that between 17.8% and 37.8% of all teens would report having more than 500 friends on Facebook.

Solution 8.13

Solution B

Using the TI-83, 83+, 84, 84+ Calculator


Press STAT and arrow over to TESTS.
Arrow down to A:1-PropZint. Press ENTER.
Arrow down to x and enter 15.
Arrow down to n and enter 54.
Arrow down to C-Level and enter 0.90.
Arrow down to Calculate and press ENTER.
The confidence interval is (0.178, 0.378).

Try It 8.13

The Berkman Center Study referenced in Example 8.13 talked to teens in smaller focus groups, but also interviewed additional teens over the phone. When the study was complete, 588 teens had answered the question about their Facebook friends with 159 saying that they have more than 500 friends. Use the “plus-four” method to find a 90% confidence interval for the true proportion of teens that would report having more than 500 Facebook friends based on this larger sample. Compare the results to those in Example 8.13.

Calculating the Sample Size n

If researchers desire a specific margin of error, then they can use the error bound formula to calculate the required sample size.

The error bound formula for a population proportion is

  • EBP=( z α 2 )( p q n ) EBP=( z α 2 )( p q n )
  • Solving for n gives you an equation for the sample size.
  • n= ( z α 2 ) 2 ( p q ) EB P 2 n= ( z α 2 ) 2 ( p q ) EB P 2

Example 8.14

Suppose a mobile phone company wants to determine the current percentage of customers aged 50+ who use text messaging on their cell phones. How many customers aged 50+ should the company survey in order to be 90% confident that the estimated (sample) proportion is within three percentage points of the true population proportion of customers aged 50+ who use text messaging on their cell phones.

Solution 8.14

From the problem, we know that EBP = 0.03 (3%=0.03) and z α 2 z α 2 z0.05 = 1.645 because the confidence level is 90%.

However, in order to find n, we need to know the estimated (sample) proportion p′. Remember that q′ = 1 – p′. But, we do not know p′ yet. Since we multiply p′ and q′ together, we make them both equal to 0.5 because pq′ = (0.5)(0.5) = 0.25 results in the largest possible product. (Try other products: (0.6)(0.4) = 0.24; (0.3)(0.7) = 0.21; (0.2)(0.8) = 0.16 and so on). The largest possible product gives us the largest n. This gives us a large enough sample so that we can be 90% confident that we are within three percentage points of the true population proportion. To calculate the sample size n, use the formula and make the substitutions.

n= z 2 p q EB P 2 n= z 2 p q EB P 2 gives n= 1.645 2 (0.5)(0.5) 0.03 2 =751.7 n= 1.645 2 (0.5)(0.5) 0.03 2 =751.7

Round the answer to the next higher value. The sample size should be 752 cell phone customers aged 50+ in order to be 90% confident that the estimated (sample) proportion is within three percentage points of the true population proportion of all customers aged 50+ who use text messaging on their cell phones.

Try It 8.14

Suppose an internet marketing company wants to determine the current percentage of customers who click on ads on their smartphones. How many customers should the company survey in order to be 90% confident that the estimated proportion is within five percentage points of the true population proportion of customers who click on ads on their smartphones?

Citation/Attribution

Want to cite, share, or modify this book? This book is Creative Commons Attribution License 4.0 and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction
Citation information

© Sep 19, 2013 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License 4.0 license. The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.