Skip to Content
OpenStax Logo
Statistics

Solutions

StatisticsSolutions
  1. Preface
  2. 1 Sampling and Data
    1. Introduction
    2. 1.1 Definitions of Statistics, Probability, and Key Terms
    3. 1.2 Data, Sampling, and Variation in Data and Sampling
    4. 1.3 Frequency, Frequency Tables, and Levels of Measurement
    5. 1.4 Experimental Design and Ethics
    6. 1.5 Data Collection Experiment
    7. 1.6 Sampling Experiment
    8. Key Terms
    9. Chapter Review
    10. Practice
    11. Homework
    12. Bringing It Together: Homework
    13. References
    14. Solutions
  3. 2 Descriptive Statistics
    1. Introduction
    2. 2.1 Stem-and-Leaf Graphs (Stemplots), Line Graphs, and Bar Graphs
    3. 2.2 Histograms, Frequency Polygons, and Time Series Graphs
    4. 2.3 Measures of the Location of the Data
    5. 2.4 Box Plots
    6. 2.5 Measures of the Center of the Data
    7. 2.6 Skewness and the Mean, Median, and Mode
    8. 2.7 Measures of the Spread of the Data
    9. 2.8 Descriptive Statistics
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  4. 3 Probability Topics
    1. Introduction
    2. 3.1 Terminology
    3. 3.2 Independent and Mutually Exclusive Events
    4. 3.3 Two Basic Rules of Probability
    5. 3.4 Contingency Tables
    6. 3.5 Tree and Venn Diagrams
    7. 3.6 Probability Topics
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Bringing It Together: Practice
    13. Homework
    14. Bringing It Together: Homework
    15. References
    16. Solutions
  5. 4 Discrete Random Variables
    1. Introduction
    2. 4.1 Probability Distribution Function (PDF) for a Discrete Random Variable
    3. 4.2 Mean or Expected Value and Standard Deviation
    4. 4.3 Binomial Distribution (Optional)
    5. 4.4 Geometric Distribution (Optional)
    6. 4.5 Hypergeometric Distribution (Optional)
    7. 4.6 Poisson Distribution (Optional)
    8. 4.7 Discrete Distribution (Playing Card Experiment)
    9. 4.8 Discrete Distribution (Lucky Dice Experiment)
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. References
    16. Solutions
  6. 5 Continuous Random Variables
    1. Introduction
    2. 5.1 Continuous Probability Functions
    3. 5.2 The Uniform Distribution
    4. 5.3 The Exponential Distribution (Optional)
    5. 5.4 Continuous Distribution
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  7. 6 The Normal Distribution
    1. Introduction
    2. 6.1 The Standard Normal Distribution
    3. 6.2 Using the Normal Distribution
    4. 6.3 Normal Distribution—Lap Times
    5. 6.4 Normal Distribution—Pinkie Length
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  8. 7 The Central Limit Theorem
    1. Introduction
    2. 7.1 The Central Limit Theorem for Sample Means (Averages)
    3. 7.2 The Central Limit Theorem for Sums (Optional)
    4. 7.3 Using the Central Limit Theorem
    5. 7.4 Central Limit Theorem (Pocket Change)
    6. 7.5 Central Limit Theorem (Cookie Recipes)
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. References
    13. Solutions
  9. 8 Confidence Intervals
    1. Introduction
    2. 8.1 A Single Population Mean Using the Normal Distribution
    3. 8.2 A Single Population Mean Using the Student's t-Distribution
    4. 8.3 A Population Proportion
    5. 8.4 Confidence Interval (Home Costs)
    6. 8.5 Confidence Interval (Place of Birth)
    7. 8.6 Confidence Interval (Women's Heights)
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. References
    14. Solutions
  10. 9 Hypothesis Testing with One Sample
    1. Introduction
    2. 9.1 Null and Alternative Hypotheses
    3. 9.2 Outcomes and the Type I and Type II Errors
    4. 9.3 Distribution Needed for Hypothesis Testing
    5. 9.4 Rare Events, the Sample, and the Decision and Conclusion
    6. 9.5 Additional Information and Full Hypothesis Test Examples
    7. 9.6 Hypothesis Testing of a Single Mean and Single Proportion
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. References
    14. Solutions
  11. 10 Hypothesis Testing with Two Samples
    1. Introduction
    2. 10.1 Two Population Means with Unknown Standard Deviations
    3. 10.2 Two Population Means with Known Standard Deviations
    4. 10.3 Comparing Two Independent Population Proportions
    5. 10.4 Matched or Paired Samples (Optional)
    6. 10.5 Hypothesis Testing for Two Means and Two Proportions
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. Bringing It Together: Homework
    13. References
    14. Solutions
  12. 11 The Chi-Square Distribution
    1. Introduction
    2. 11.1 Facts About the Chi-Square Distribution
    3. 11.2 Goodness-of-Fit Test
    4. 11.3 Test of Independence
    5. 11.4 Test for Homogeneity
    6. 11.5 Comparison of the Chi-Square Tests
    7. 11.6 Test of a Single Variance
    8. 11.7 Lab 1: Chi-Square Goodness-of-Fit
    9. 11.8 Lab 2: Chi-Square Test of Independence
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  13. 12 Linear Regression and Correlation
    1. Introduction
    2. 12.1 Linear Equations
    3. 12.2 The Regression Equation
    4. 12.3 Testing the Significance of the Correlation Coefficient (Optional)
    5. 12.4 Prediction (Optional)
    6. 12.5 Outliers
    7. 12.6 Regression (Distance from School) (Optional)
    8. 12.7 Regression (Textbook Cost) (Optional)
    9. 12.8 Regression (Fuel Efficiency) (Optional)
    10. Key Terms
    11. Chapter Review
    12. Formula Review
    13. Practice
    14. Homework
    15. Bringing It Together: Homework
    16. References
    17. Solutions
  14. 13 F Distribution and One-way Anova
    1. Introduction
    2. 13.1 One-Way ANOVA
    3. 13.2 The F Distribution and the F Ratio
    4. 13.3 Facts About the F Distribution
    5. 13.4 Test of Two Variances
    6. 13.5 Lab: One-Way ANOVA
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Homework
    12. References
    13. Solutions
  15. A | Appendix A Review Exercises (Ch 3–13)
  16. B | Appendix B Practice Tests (1–4) and Final Exams
  17. C | Data Sets
  18. D | Group and Partner Projects
  19. E | Solution Sheets
  20. F | Mathematical Phrases, Symbols, and Formulas
  21. G | Notes for the TI-83, 83+, 84, 84+ Calculators
  22. H | Tables
  23. Index
1.

dependent variable: fee amount

independent variable: time

3.
This is a graph of the equation y = 25 + 12.50x. The x-axis is labeled in intervals of 1 from 0 - 7; the y-axis is labeled in intervals of 25 from 0 - 100. The equation's graph is a line that crosses the y-axis at 25 and is sloped up and to the right, rising 12.50 units for every one unit of run.
Figure 12.25
5.
This is a graph of the equation y = 10 + 5x. The x-axis is labeled in intervals of 1 from 0 - 7; the y-axis is labeled in intervals of 10 from 0 - 50. The equation's graph is a line that crosses the y-axis at 10 and is sloped up and to the right, rising 5 units for every one unit of run.
Figure 12.26
7.

y = 6x + 8, 4y = 8, and y + 7 = 3x are all linear equations.

9.

The number of flu cases depends on the year. Therefore, year becomes the independent variable and the number of flu cases is the dependent variable.

11.

The y-intercept is 50 (a = 50). At the start of the cleaning, the company charges a one-time fee of $50 (this is when x = 0). The slope is 100 (b = 100). For each session, the company charges $100 for each hour they clean.

13.

12,000 lb of soil

15.

The slope is –1.5 (b = –1.5). This means the stock is losing value at a rate of $1.50 per hour. The y-intercept is $15 (a = 15). This means the price of stock before the trading day was $15.

17.
Group x (no. of hours spent studying) y (final exam grades) Median x value Median y value
1 1
2
3
45
50
51
2 50
2 4
5
65
72
4.5 68.5
3 6
7
8
80
90
96
7 90
Table 12.38
19.

ŷ = 2.23 + 1.99x

21.

The slope is 1.99 (b = 1.99). It means that for every endorsement deal a professional player gets, he gets an average of another $1.99 million in pay each year.

23.

It means that there is no correlation between the data sets.

25.

Yes. There are enough data points and the value of r is strong enough to show there is a strong negative correlation between the data sets.

27.

Ha: ρ ≠ 0

29.

$250,120

31.

1326 acres

33.

1125 hours, or when x = 1125

35.

Check student solution.

36.
  1. When x = 1985, ŷ = 25,52.
  2. When x = 1990, ŷ = 34,275.
  3. When x = 1970, ŷ = –725. Why doesn’t this answer make sense? The range of x values was 1981 to 2002; the year 1970 is not in this range. The regression equation does not apply, because predicting for the year 1970 is extrapolation, which requires a different process. Also, a negative number does not make sense in this context, when we are predicting flu cases diagnosed.
38.

Also, the correlation r = 0.4526. If r is compared with the value in the 95 Percent Critical Values of the Sample Correlation Coefficient Table, because r > 0.423, r is significant, and you would think that the line could be used for prediction. But, the scatter plot indicates otherwise.

39.

Check student’ solution.

40.

ŷ ŷ = 3,448,225 + 1750x

42.

There was an increase in flu cases diagnosed until 1993. From 1993 through 2002, the number of flu cases diagnosed declined each year. It is not appropriate to use a linear regression line to fit to the data.

44.

Because there is no linear association between year and number of flu cases diagnosed, it is not appropriate to calculate a linear correlation coefficient. When there is a linear association and it is appropriate to calculate a correlation, we cannot say that one variable causes the other variable.

46.

We don’t know if the pre-1981 data were collected from a single year. So, we don’t have an accurate x value for this figure.

Regression equation: ŷ (number of flu cases) = –3,448,225 + 1749.777 (year).

Coefficients
Intercept –3,448,225
x Variable 1 1,749.777
Table 12.39
47.
  • a = –3,488,225
  • b = 1,750
  • correlation = 0.4526
  • n = 22
48.

No, he is not correct. An outlier is only an influential point if it significantly impacts the slope of the least-squares regression line and the correlation coefficient, r. If omission of this data point from the calculation of the regression line does not show much impact on the slope or r-value, then the outlier is not considered an influential point. For different reasons, it still may be determined that the data point must be omitted from the data set.

49.

Yes. There appears to be an outlier at (6, 58).

51.

The potential outlier flattened the slope of the line of best fit because it was below the data set. It made the line of best fit less accurate as a predictor for the data.

53.

s = 1.75

55.
  1. independent variable: age; dependent variable: fatalities
  2. independent variable: number of family members; dependent variable: grocery bill
  3. independent variable: age of applicant; dependent variable: insurance premium
  4. independent variable: power consumption; dependent variable: utility
  5. independent variable: higher education (years); dependent variable: crime rates
58.

It means that 72 percent of the variation in the dependent variable (y) can be explained by the variation in the independent variable (x).

60.
x (SAT math scores) y (GPAs)
261 50
262 60
327 62
350 65
363 70
364 67
373 71
544 86
587 87
624 90
741 98
Table 12.40

We must remember to check the order of the y values within each group as well. We notice that the y values in the second group are not in order from the least value to the greatest value; these values thus must be reordered, meaning the median y value for that group is 70.

Group x (SAT math scores) y (GPAs) Median x value Median y value
1 261
262
327
350
50
60
62
65
294.5 61
2 363
364
373
67
70
71
364 70
3 544
587
624
741
86
87
90
98
605.5 88.5
Table 12.41

The ordered pairs are (294.5, 61), (364, 70), and (605.5, 88.5).

The slope can be calculated using the formula m= y 3 y 1 x 3 x 1 . m= y 3 y 1 x 3 x 1 . Substituting the median x and y values, from the first and third groups gives m= 88.561 605.5294.5 , m= 88.561 605.5294.5 , which simplifies to m0.09 . m0.09 .

The y-intercept may be found using the formula b= ym x 3 . b= ym x 3 . The sum of the median x values is 1264, and the sum of the median y values is 219.5. Substituting these sums and the slope into the formula gives b= 219.50.09(1264) 3 , b= 219.50.09(1264) 3 , which simplifies to b35.25 . b35.25 .

The line of best fit is represented as y=mx+b y=mx+b . Thus, the equation can be written as y=0.09x+35.25 . y=0.09x+35.25 .

61.

b. Check student solution.

c. ŷ = 35.5818045 – 0.19182491x

d. r = –0.57874
For four degrees of freedom and alpha = 0.05, the LinRegTTest gives a p value of 0.2288, so we do not reject the null hypothesis; there is not a significant linear relationship between deaths and age.
Using the table of critical values for the correlation coefficient, with four degrees of freedom, the critical value is 0.811. The correlation coefficient r = –0.57874 is not less than –0.811, so we do not reject the null hypothesis.

f. There is not a linear relationship between the two variables, as evidenced by a p value greater than 0.05.

63.

a. We wonder if the better discounts appear earlier in the book, so we select page as x and discount as y.

b. Check student solution.

c. ŷ = 17.21757 – 0.01412x

d. r = – 0.2752
For seven degrees of freedom and alpha = 0.05, LinRegTTest gives a p value = 0.4736, so we do not reject; there is a not a significant linear relationship between page and discount.
Using the table of critical values for the correlation coefficient, with seven gives degrees of freedom, the critical value is 0.666. The correlation coefficient xi = –0.2752 is not less than 0.666, so we do not reject.

f. There is not a significant linear correlation so it appears there is no relationship between the page and the amount of the discount.

As the page number increases by one page, the discount decreases by $0.01412.

65.

a. Year is the independent or x variable; the number of letters is the dependent or y variable.

b. Check student’s solution.

c. No.

d. ŷ = 47.03 – 0.0216x

e. –0.4280. The r value indicates that there is not a significant correlation between the year the state entered the Union and the number of letters in the name.

g. No. The relationship does not appear to be linear; the correlation is not significant.

66.

Using LinRegTTest, the output for the original least-squares regression line is y=26.14+0.7539x y=26.14+0.7539x and r=0.6657 . r=0.6657 .

The output for the new least-squares regression line, after omitting the outlier of (56, 95), is ŷ =6.36+1.0045x ŷ =6.36+1.0045x and r=0.9757 . r=0.9757 .

The slope of the new line is quite a bit different from the slope of the original least-squares regression line, but the larger change is shown in the r-values, such that the new line has an r-value that has increased to a value that is almost equal to one.

Thus, it may be stated that the outlier (56, 95) is also an influential point.

68.

a. and b. Check student solution.

c. The slope of the regression line is –0.3031 with a y-intercept of 31.93. In context, the y-intercept indicates that when there are no returning sparrow hawks, there will be almost 32 percent new sparrow hawks, which doesn’t make sense, because if there are no returning birds, then the new percentage would have to be 100% (this is an example of why we do not extrapolate). The slope tells us that for each percentage increase in returning birds, the percentage of new birds in the colony decreases by 30.3 percent.

d. If we examine r2, we see that only 57.52 percent of the variation in the percentage of new birds is explained by the model and the correlation coefficient, r = –.7584 only indicates a somewhat strong correlation between returning and new percentages.

e. The ordered pair (66, 6) generates the largest residual of 6.0. This means that when the observed return percentage is 66 percent, our observed new percentage, 6 percent, is almost 6 percent less than the predicted new value of 11.98 percent. If we remove this data pair, we see only an adjusted slope of –0.2789 and an adjusted intercept of 30.9816. In other words, although these data generate the largest residual, it is not an outlier, nor is the data pair an influential point.

f. If there are 70 percent returning birds, we would expect to see y =– 0.2789(70) + 30.9816 = 0.114 or 11.4 percent new birds in the colony.

70.
  1. Check student solution.
  2. Check student solution.
  3. We have a slope of –1.4946 with a y-intercept of 193.88. The slope, in context, indicates that for each additional minute added to the swim time, the heart rate decreases by 1.5 beats per minute. If the student is not swimming at all, the y-intercept indicates that his heart rate will be 193.88 beats per minute. Although the slope has meaning (the longer it takes to swim 2000 m, the less effort the heart puts out), the y-intercept does not make sense. If the athlete is not swimming (resting), then his heart rate should be very low.
  4. Because only 1.5 percent of the heart rate variation is explained by this regression equation, we must conclude that this association is not explained with a linear relationship.
  5. Point (34.72, 124) generates the largest residual: –11.82. This means that our observed heart rate is almost 12 beats less than our predicted rate of 136 beats per minute. When this point is removed, the slope becomes –2.953, with the y-intercept changing to 247.1616. Although the linear association is still very weak, we see that the removed data pair can be considered an influential point in the sense that the y-intercept becomes more meaningful.
72.

If we remove the two service academies (the tuition is $0.00), we construct a new regression equation of y = –0.0009x + 160, with a correlation coefficient of 0.71397 and a coefficient of determination of 0.50976. This allows us to say there is a fairly strong linear association between tuition costs and salaries if the service academies are removed from the data set.

73.

c. No. The y-intercept would occur at year 0, which doesn’t exist.

74.
  1. Check student's solution.
  2. Yes.
  3. No, the y-intercept would occur at year 0, which doesn’t exist.
  4. ŷ = −266.8863 + 0.1656x.
  5. 0.9448, yes.
  6. 62.8233, 62.3265.
  7. Yes.
  8. No, (1987, 62.7).
  9. 72.5937, no.
  10. Slope = 0.1656. As the year increases by one, the percent of workers paid hourly rates tends to increase by 0.1656.
76.
  1. Size (ounces) Cost ($) Cost per ounce
    16 3.99 24.94
    32 4.99 15.59
    64 5.99 9.36
    200 10.99 5.50
    Table 12.42
  2. Check student solution.
  3. There is a linear relationship for the sizes 16 through 64, but that linear trend does not continue to the 200-oz size.
  4. ŷ = 20.2368 – 0.0819x
  5. r = –.8086
  6. 40-oz: 16.96 cents/oz
  7. 90-oz: 12.87 cents/oz
  8. The relationship is not linear; the least-squares line is not appropriate.
  9. There are no outliers.
  10. No. You would be extrapolating. The 300-oz size is outside the range of x.
  11. X = –0.08194. For each additional ounce in size, the cost per ounce decreases by 0.082 cents.
78.
  1. Size is x, the independent variable, and price is y, the dependent variable.
  2. Check student solution.
  3. The relationship does not appear to be linear.
  4. ŷ = –745.252 + 54.75569x.
  5. r = .8944 and yes, it is significant.
  6. 32-inch: $1006.93, 50-inch: $1992.53.
  7. No, the relationship does not appear to be linear. However, r is significant.
  8. No, the 60-inch TV.
  9. For each additional inch, the price increases by $54.76.
80.
  1. Rank is the independent variable and area is the dependent variable.
  2. Check student solution.
  3. There appears to be a linear relationship, with one outlier.
  4. ŷ (area) = 24177.06 + 1010.478x
  5. r = .50047. r is not significant, so there is no relationship between the variables.
  6. Alabama: 46,407.576 square miles, Colorado: 62,575.224 square miles.
  7. The Alabama estimate is closer than the Colorado estimate.
  8. If the outlier is removed, there is a linear relationship.
  9. There is one outlier (Hawaii).
  10. rank 51: 75,711.4 square miles, no.
  11. Alabama 7 1819 22 52,423
    Colorado 8 1876 38 104,100
    Hawaii 6 1959 50 10,932
    Iowa 4 1846 29 56,276
    Maryland 8 1788 7 12,407
    Missouri 8 1821 24 69,709
    New Jersey 9 1787 3 8,722
    Ohio 4 1803 17 44,828
    South Carolina 13 1788 8 32,008
    Utah 4 1896 45 84,904
    Wisconsin 9 1848 30 65,499
    Table 12.43
  12. ŷ = –87065.3 + 7828.532x.
  13. Alabama: 85,162.404; the prior estimate was closer. Alaska is an outlier.
  14. Yes, with the exception of Hawaii.
Citation/Attribution

Want to cite, share, or modify this book? This book is Creative Commons Attribution License 4.0 and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/statistics/pages/1-introduction
Citation information

© May 8, 2020 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License 4.0 license. The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.