Skip to Content
OpenStax Logo
Statistics

Introduction

StatisticsIntroduction
  1. Preface
  2. 1 Sampling and Data
    1. Introduction
    2. 1.1 Definitions of Statistics, Probability, and Key Terms
    3. 1.2 Data, Sampling, and Variation in Data and Sampling
    4. 1.3 Frequency, Frequency Tables, and Levels of Measurement
    5. 1.4 Experimental Design and Ethics
    6. 1.5 Data Collection Experiment
    7. 1.6 Sampling Experiment
    8. Key Terms
    9. Chapter Review
    10. Practice
    11. Homework
    12. Bringing It Together: Homework
    13. References
    14. Solutions
  3. 2 Descriptive Statistics
    1. Introduction
    2. 2.1 Stem-and-Leaf Graphs (Stemplots), Line Graphs, and Bar Graphs
    3. 2.2 Histograms, Frequency Polygons, and Time Series Graphs
    4. 2.3 Measures of the Location of the Data
    5. 2.4 Box Plots
    6. 2.5 Measures of the Center of the Data
    7. 2.6 Skewness and the Mean, Median, and Mode
    8. 2.7 Measures of the Spread of the Data
    9. 2.8 Descriptive Statistics
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  4. 3 Probability Topics
    1. Introduction
    2. 3.1 Terminology
    3. 3.2 Independent and Mutually Exclusive Events
    4. 3.3 Two Basic Rules of Probability
    5. 3.4 Contingency Tables
    6. 3.5 Tree and Venn Diagrams
    7. 3.6 Probability Topics
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Bringing It Together: Practice
    13. Homework
    14. Bringing It Together: Homework
    15. References
    16. Solutions
  5. 4 Discrete Random Variables
    1. Introduction
    2. 4.1 Probability Distribution Function (PDF) for a Discrete Random Variable
    3. 4.2 Mean or Expected Value and Standard Deviation
    4. 4.3 Binomial Distribution (Optional)
    5. 4.4 Geometric Distribution (Optional)
    6. 4.5 Hypergeometric Distribution (Optional)
    7. 4.6 Poisson Distribution (Optional)
    8. 4.7 Discrete Distribution (Playing Card Experiment)
    9. 4.8 Discrete Distribution (Lucky Dice Experiment)
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. References
    16. Solutions
  6. 5 Continuous Random Variables
    1. Introduction
    2. 5.1 Continuous Probability Functions
    3. 5.2 The Uniform Distribution
    4. 5.3 The Exponential Distribution (Optional)
    5. 5.4 Continuous Distribution
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  7. 6 The Normal Distribution
    1. Introduction
    2. 6.1 The Standard Normal Distribution
    3. 6.2 Using the Normal Distribution
    4. 6.3 Normal Distribution—Lap Times
    5. 6.4 Normal Distribution—Pinkie Length
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  8. 7 The Central Limit Theorem
    1. Introduction
    2. 7.1 The Central Limit Theorem for Sample Means (Averages)
    3. 7.2 The Central Limit Theorem for Sums (Optional)
    4. 7.3 Using the Central Limit Theorem
    5. 7.4 Central Limit Theorem (Pocket Change)
    6. 7.5 Central Limit Theorem (Cookie Recipes)
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. References
    13. Solutions
  9. 8 Confidence Intervals
    1. Introduction
    2. 8.1 A Single Population Mean Using the Normal Distribution
    3. 8.2 A Single Population Mean Using the Student's t-Distribution
    4. 8.3 A Population Proportion
    5. 8.4 Confidence Interval (Home Costs)
    6. 8.5 Confidence Interval (Place of Birth)
    7. 8.6 Confidence Interval (Women's Heights)
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. References
    14. Solutions
  10. 9 Hypothesis Testing with One Sample
    1. Introduction
    2. 9.1 Null and Alternative Hypotheses
    3. 9.2 Outcomes and the Type I and Type II Errors
    4. 9.3 Distribution Needed for Hypothesis Testing
    5. 9.4 Rare Events, the Sample, and the Decision and Conclusion
    6. 9.5 Additional Information and Full Hypothesis Test Examples
    7. 9.6 Hypothesis Testing of a Single Mean and Single Proportion
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. References
    14. Solutions
  11. 10 Hypothesis Testing with Two Samples
    1. Introduction
    2. 10.1 Two Population Means with Unknown Standard Deviations
    3. 10.2 Two Population Means with Known Standard Deviations
    4. 10.3 Comparing Two Independent Population Proportions
    5. 10.4 Matched or Paired Samples (Optional)
    6. 10.5 Hypothesis Testing for Two Means and Two Proportions
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. Bringing It Together: Homework
    13. References
    14. Solutions
  12. 11 The Chi-Square Distribution
    1. Introduction
    2. 11.1 Facts About the Chi-Square Distribution
    3. 11.2 Goodness-of-Fit Test
    4. 11.3 Test of Independence
    5. 11.4 Test for Homogeneity
    6. 11.5 Comparison of the Chi-Square Tests
    7. 11.6 Test of a Single Variance
    8. 11.7 Lab 1: Chi-Square Goodness-of-Fit
    9. 11.8 Lab 2: Chi-Square Test of Independence
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  13. 12 Linear Regression and Correlation
    1. Introduction
    2. 12.1 Linear Equations
    3. 12.2 The Regression Equation
    4. 12.3 Testing the Significance of the Correlation Coefficient (Optional)
    5. 12.4 Prediction (Optional)
    6. 12.5 Outliers
    7. 12.6 Regression (Distance from School) (Optional)
    8. 12.7 Regression (Textbook Cost) (Optional)
    9. 12.8 Regression (Fuel Efficiency) (Optional)
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  14. 13 F Distribution and One-way Anova
    1. Introduction
    2. 13.1 One-Way ANOVA
    3. 13.2 The F Distribution and the F Ratio
    4. 13.3 Facts About the F Distribution
    5. 13.4 Test of Two Variances
    6. 13.5 Lab: One-Way ANOVA
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. References
    13. Solutions
  15. A | Appendix A Review Exercises (Ch 3–13)
  16. B | Appendix B Practice Tests (1–4) and Final Exams
  17. C | Data Sets
  18. D | Group and Partner Projects
  19. E | Solution Sheets
  20. F | Mathematical Phrases, Symbols, and Formulas
  21. G | Notes for the TI-83, 83+, 84, 84+ Calculators
  22. H | Tables
  23. Index
This is a photo of M&Ms piled together. The M&Ms are red, blue, green, yellow, orange and brown.
Figure 8.1 Have you ever wondered what the average number of chocolate candies in a bag at the grocery store is? You can use confidence intervals to answer this question. (credit: comedy_nose/flickr)

Chapter Objectives

By the end of this chapter, the student should be able to do the following:

  • Calculate and interpret confidence intervals for estimating a population mean and a population proportion
  • Interpret the Student's t probability distribution as the sample size changes
  • Discriminate between problems applying the normal and the Student's t-distributions
  • Calculate the sample size required to estimate a population mean and a population proportion, given a desired confidence level and margin of error

Suppose you were trying to determine the mean rent of a two-bedroom apartment in your town. You might look in the classified section of the newspaper, write down several rents listed, and average them together. You would have obtained a point estimate of the true mean. If you are trying to determine the percentage of times you make a basket when shooting a basketball, you might count the number of shots you make and divide that by the number of shots you attempt. In this case, you would have obtained a point estimate for the true proportion.

We use sample data to make generalizations about an unknown population. This part of statistics is called inferential statistics. The sample data help us to make an estimate of a population parameter. We realize that the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals.

In this chapter, you will learn to construct and interpret confidence intervals. You will also learn a new distribution, the Student's-t, and how it is used with those intervals. Throughout the chapter, it is important to keep in mind that the confidence interval is a random variable. It is the population parameter that is fixed.

If you worked in the marketing department of an entertainment company, you might be interested in the mean number of songs a consumer downloads a month from an internet music store. If so, you could conduct a survey and calculate the sample mean, x ¯ , x ¯ , and the sample standard deviation, s. You would use x ¯ x ¯ to estimate the population mean and s to estimate the population standard deviation. The sample mean, x ¯ x ¯ , is the point estimate for the population mean, μ. The sample standard deviation, s, is the point estimate for the population standard deviation, σ.

Each instance of x ¯ x ¯ and s is called a statistic.

A confidence interval is another type of estimate but, instead of being just one number, it is an interval of numbers. The interval of numbers is a range of values calculated from a given set of sample data. The confidence interval is likely to include an unknown population parameter.

Suppose, for the internet music example, we do not know the population mean, μ, but we do know that the population standard deviation is σ = 1 and our sample size is 100. Then, by the central limit theorem, the standard deviation for the sample mean is

σ n = 1 100 =0.1 . σ n = 1 100 =0.1 .

The empirical rule, which applies to bell-shaped distributions, says that in approximately 95 percent of the samples, the sample mean, x ¯ x ¯ , will be within two standard deviations of the population mean, μ. For our internet music example, two standard deviations would be calculated as (2)(0.1) = 0.2. The sample mean, x ¯ , x ¯ , is likely to be within 0.2 units of μ.

In this example, we do not know the true population mean μ (because we do not have information from all the internet music users!), but we can compute the sample mean x ¯ x ¯ based on our sample of 100 individuals. Because the sample mean is likely to be within 0.2 units of the true population mean 95 percent of the times that we take a sample of 100 users, we can say with 95 percent confidence that μ is within 0.2 units of x ¯ x ¯ . In other words, μ is somewhere between x ¯ 0.2 x ¯ 0.2 and x ¯ +0.2 x ¯ +0.2 .

Suppose that from the sample of 100 internet music customers, we compute a sample mean download of x ¯ =2 x ¯ =2 songs per month. Since we know that the population standard deviation is σ1 σ1 , according to the central limit theorem, the standard deviation for the sample means is σ= 1 100 =0.1 σ= 1 100 =0.1 .

We know that there is a 95 percent chance that the true population mean value μ is between two standard deviations from the sample mean. That is, with 95 percent confidence we can say that μ is between x ¯ 2× σ n x ¯ 2× σ n and x ¯ 2× σ n x ¯ 2× σ n .

Replacing the symbols for their values in this example, we say that we are 95 percent confident that the true average number of songs downloaded from an internet music store per month is between x ¯ 2× σ n =22× σ 100 =2.02=1.8 x ¯ 2× σ n =22× σ 100 =2.02=1.8 , and

x ¯ +2× σ n =2+2× σ 100 =2+.02=2.2. x ¯ +2× σ n =2+2× σ 100 =2+.02=2.2.

The 95 percent confidence interval for μ is (1.8, 2.2).

The 95 percent confidence interval implies two possibilities. Either the interval (1.8, 2.2) contains the true mean, μ, or our sample produced an x ¯ x ¯ that is not within 0.2 units of the true mean μ. The second possibility happens for only 5 percent of all the samples (95–100 percent).

Remember that a confidence interval is created for an unknown population parameter like the population mean, μ. Confidence intervals for some parameters have the form

(point estimate – margin of error, point estimate + margin of error).

The margin of error depends on the confidence level or percentage of confidence and the standard error of the mean.

When you read newspapers and journals, you might notice that some reports use the phrase margin of error. Other reports will not use that phrase, but include a confidence interval as the point estimate plus or minus the margin of error. Those are two ways of expressing the same concept.

Note

Although the text covers only symmetrical confidence intervals, there are non-symmetrical confidence intervals (for example, a confidence interval for the standard deviation).

Collaborative Exercise

Have your instructor record the number of meals each student in your class eats out in a week. Assume that the standard deviation is known to be three meals. Construct an approximate 95 percent confidence interval for the true mean number of meals students eat out each week.

  1. Calculate the sample mean.
  2. Let σ = 3 and n = the number of students surveyed.
  3. Construct the interval. ( x 2( σ n ) ),( x +2( σ n ) ) ( x 2( σ n ) ),( x +2( σ n ) ) MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaamaabmaabaWaaCbiaeaacaWG4baaleqabaGaeyOeI0caaOGaeyOeI0IaaGOmamaabmaabaWaaSaaaeaacqaHdpWCaeaadaGcaaqaaiaad6gaaSqabaaaaaGccaGLOaGaayzkaaaacaGLOaGaayzkaaGaaiilamaabmaabaWaaCbiaeaacaWG4baaleqabaGaeyOeI0caaOGaey4kaSIaaGOmamaabmaabaWaaSaaaeaacqaHdpWCaeaadaGcaaqaaiaad6gaaSqabaaaaaGccaGLOaGaayzkaaaacaGLOaGaayzkaaaaaa@4A53@

We say we are approximately 95 percent confident that the true mean number of meals that students eat out in a week is between __________ and ___________.

Citation/Attribution

Want to cite, share, or modify this book? This book is Creative Commons Attribution License 4.0 and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/statistics/pages/1-introduction
Citation information

© Apr 3, 2020 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License 4.0 license. The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.