Skip to Content
OpenStax Logo
Buy book
  1. Preface
  2. 1 Sampling and Data
    1. Introduction
    2. 1.1 Definitions of Statistics, Probability, and Key Terms
    3. 1.2 Data, Sampling, and Variation in Data and Sampling
    4. 1.3 Levels of Measurement
    5. 1.4 Experimental Design and Ethics
    6. Key Terms
    7. Chapter Review
    8. Homework
    9. References
    10. Solutions
  3. 2 Descriptive Statistics
    1. Introduction
    2. 2.1 Display Data
    3. 2.2 Measures of the Location of the Data
    4. 2.3 Measures of the Center of the Data
    5. 2.4 Sigma Notation and Calculating the Arithmetic Mean
    6. 2.5 Geometric Mean
    7. 2.6 Skewness and the Mean, Median, and Mode
    8. 2.7 Measures of the Spread of the Data
    9. Key Terms
    10. Chapter Review
    11. Formula Review
    12. Practice
    13. Homework
    14. Bringing It Together: Homework
    15. References
    16. Solutions
  4. 3 Probability Topics
    1. Introduction
    2. 3.1 Terminology
    3. 3.2 Independent and Mutually Exclusive Events
    4. 3.3 Two Basic Rules of Probability
    5. 3.4 Contingency Tables and Probability Trees
    6. 3.5 Venn Diagrams
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Bringing It Together: Practice
    12. Homework
    13. Bringing It Together: Homework
    14. References
    15. Solutions
  5. 4 Discrete Random Variables
    1. Introduction
    2. 4.1 Hypergeometric Distribution
    3. 4.2 Binomial Distribution
    4. 4.3 Geometric Distribution
    5. 4.4 Poisson Distribution
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  6. 5 Continuous Random Variables
    1. Introduction
    2. 5.1 Properties of Continuous Probability Density Functions
    3. 5.2 The Uniform Distribution
    4. 5.3 The Exponential Distribution
    5. Key Terms
    6. Chapter Review
    7. Formula Review
    8. Practice
    9. Homework
    10. References
    11. Solutions
  7. 6 The Normal Distribution
    1. Introduction
    2. 6.1 The Standard Normal Distribution
    3. 6.2 Using the Normal Distribution
    4. 6.3 Estimating the Binomial with the Normal Distribution
    5. Key Terms
    6. Chapter Review
    7. Formula Review
    8. Practice
    9. Homework
    10. References
    11. Solutions
  8. 7 The Central Limit Theorem
    1. Introduction
    2. 7.1 The Central Limit Theorem for Sample Means
    3. 7.2 Using the Central Limit Theorem
    4. 7.3 The Central Limit Theorem for Proportions
    5. 7.4 Finite Population Correction Factor
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  9. 8 Confidence Intervals
    1. Introduction
    2. 8.1 A Confidence Interval for a Population Standard Deviation, Known or Large Sample Size
    3. 8.2 A Confidence Interval for a Population Standard Deviation Unknown, Small Sample Case
    4. 8.3 A Confidence Interval for A Population Proportion
    5. 8.4 Calculating the Sample Size n: Continuous and Binary Random Variables
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  10. 9 Hypothesis Testing with One Sample
    1. Introduction
    2. 9.1 Null and Alternative Hypotheses
    3. 9.2 Outcomes and the Type I and Type II Errors
    4. 9.3 Distribution Needed for Hypothesis Testing
    5. 9.4 Full Hypothesis Test Examples
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  11. 10 Hypothesis Testing with Two Samples
    1. Introduction
    2. 10.1 Comparing Two Independent Population Means
    3. 10.2 Cohen's Standards for Small, Medium, and Large Effect Sizes
    4. 10.3 Test for Differences in Means: Assuming Equal Population Variances
    5. 10.4 Comparing Two Independent Population Proportions
    6. 10.5 Two Population Means with Known Standard Deviations
    7. 10.6 Matched or Paired Samples
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. Bringing It Together: Homework
    14. References
    15. Solutions
  12. 11 The Chi-Square Distribution
    1. Introduction
    2. 11.1 Facts About the Chi-Square Distribution
    3. 11.2 Test of a Single Variance
    4. 11.3 Goodness-of-Fit Test
    5. 11.4 Test of Independence
    6. 11.5 Test for Homogeneity
    7. 11.6 Comparison of the Chi-Square Tests
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. Bringing It Together: Homework
    14. References
    15. Solutions
  13. 12 F Distribution and One-Way ANOVA
    1. Introduction
    2. 12.1 Test of Two Variances
    3. 12.2 One-Way ANOVA
    4. 12.3 The F Distribution and the F-Ratio
    5. 12.4 Facts About the F Distribution
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  14. 13 Linear Regression and Correlation
    1. Introduction
    2. 13.1 The Correlation Coefficient r
    3. 13.2 Testing the Significance of the Correlation Coefficient
    4. 13.3 Linear Equations
    5. 13.4 The Regression Equation
    6. 13.5 Interpretation of Regression Coefficients: Elasticity and Logarithmic Transformation
    7. 13.6 Predicting with a Regression Equation
    8. 13.7 How to Use Microsoft Excel® for Regression Analysis
    9. Key Terms
    10. Chapter Review
    11. Practice
    12. Solutions
  15. A | Statistical Tables
  16. B | Mathematical Phrases, Symbols, and Formulas
  17. Index

13.1 The Correlation Coefficient r

1.

In order to have a correlation coefficient between traits A and B, it is necessary to have:

  1. one group of subjects, some of whom possess characteristics of trait A, the remainder possessing those of trait B
  2. measures of trait A on one group of subjects and of trait B on another group
  3. two groups of subjects, one which could be classified as A or not A, the other as B or not B
  4. two groups of subjects, one which could be classified as A or not A, the other as B or not B
2.

Define the Correlation Coefficient and give a unique example of its use.

3.

If the correlation between age of an auto and money spent for repairs is +.90

  1. 81% of the variation in the money spent for repairs is explained by the age of the auto
  2. 81% of money spent for repairs is unexplained by the age of the auto
  3. 90% of the money spent for repairs is explained by the age of the auto
  4. none of the above
4.

Suppose that college grade-point average and verbal portion of an IQ test had a correlation of .40. What percentage of the variance do these two have in common?

  1. 20
  2. 16
  3. 40
  4. 80
5.

True or false? If false, explain why: The coefficient of determination can have values between -1 and +1.

6.

True or False: Whenever r is calculated on the basis of a sample, the value which we obtain for r is only an estimate of the true correlation coefficient which we would obtain if we calculated it for the entire population.

7.

Under a "scatter diagram" there is a notation that the coefficient of correlation is .10. What does this mean?

  1. plus and minus 10% from the means includes about 68% of the cases
  2. one-tenth of the variance of one variable is shared with the other variable
  3. one-tenth of one variable is caused by the other variable
  4. on a scale from -1 to +1, the degree of linear relationship between the two variables is +.10
8.

The correlation coefficient for X and Y is known to be zero. We then can conclude that:

  1. X and Y have standard distributions
  2. the variances of X and Y are equal
  3. there exists no relationship between X and Y
  4. there exists no linear relationship between X and Y
  5. none of these
9.

What would you guess the value of the correlation coefficient to be for the pair of variables: "number of man-hours worked" and "number of units of work completed"?

  1. Approximately 0.9
  2. Approximately 0.4
  3. Approximately 0.0
  4. Approximately -0.4
  5. Approximately -0.9
10.

In a given group, the correlation between height measured in feet and weight measured in pounds is +.68. Which of the following would alter the value of r?

  1. height is expressed centimeters.
  2. weight is expressed in Kilograms.
  3. both of the above will affect r.
  4. neither of the above changes will affect r.

13.2 Testing the Significance of the Correlation Coefficient

11.

Define a t Test of a Regression Coefficient, and give a unique example of its use.

12.

The correlation between scores on a neuroticism test and scores on an anxiety test is high and positive; therefore

  1. anxiety causes neuroticism
  2. those who score low on one test tend to score high on the other.
  3. those who score low on one test tend to score low on the other.
  4. no prediction from one test to the other can be meaningfully made.

13.3 Linear Equations

13.

True or False? If False, correct it: Suppose a 95% confidence interval for the slope β of the straight line regression of Y on X is given by -3.5 < β < -0.5. Then a two-sided test of the hypothesis H0:β=−1H0:β=−1 would result in rejection of H0H0 at the 1% level of significance.

14.

True or False: It is safer to interpret correlation coefficients as measures of association rather than causation because of the possibility of spurious correlation.

15.

We are interested in finding the linear relation between the number of widgets purchased at one time and the cost per widget. The following data has been obtained:

X: Number of widgets purchased – 1, 3, 6, 10, 15

Y: Cost per widget(in dollars) – 55, 52, 46, 32, 25

Suppose the regression line is y^=−2.5x+60y^=−2.5x+60. We compute the average price per widget if 30 are purchased and observe which of the following?

  1. y^=15dollarsy^=15dollars; obviously, we are mistaken; the prediction y^y^ is actually +15 dollars.
  2. y^=15dollarsy^=15dollars, which seems reasonable judging by the data.
  3. y^=−15dollarsy^=−15dollars, which is obvious nonsense. The regression line must be incorrect.
  4. y^=−15dollarsy^=−15dollars, which is obvious nonsense. This reminds us that predicting Y outside the range of X values in our data is a very poor practice.
16.

Discuss briefly the distinction between correlation and causality.

17.

True or False: If r is close to + or -1, we shall say there is a strong correlation, with the tacit understanding that we are referring to a linear relationship and nothing else.

13.4 The Regression Equation

18.

Suppose that you have at your disposal the information below for each of 30 drivers. Propose a model (including a very brief indication of symbols used to represent independent variables) to explain how miles per gallon vary from driver to driver on the basis of the factors measured.

Information:
  1. miles driven per day
  2. weight of car
  3. number of cylinders in car
  4. average speed
  5. miles per gallon
  6. number of passengers
19.

Consider a sample least squares regression analysis between a dependent variable (Y) and an independent variable (X). A sample correlation coefficient of −1 (minus one) tells us that

  1. there is no relationship between Y and X in the sample
  2. there is no relationship between Y and X in the population
  3. there is a perfect negative relationship between Y and X in the population
  4. there is a perfect negative relationship between Y and X in the sample.
20.

In correlational analysis, when the points scatter widely about the regression line, this means that the correlation is

  1. negative.
  2. low.
  3. heterogeneous.
  4. between two measures that are unreliable.

13.5 Interpretation of Regression Coefficients: Elasticity and Logarithmic Transformation

21.

In a linear regression, why do we need to be concerned with the range of the independent (X) variable?

22.

Suppose one collected the following information where X is diameter of tree trunk and Y is tree height.

X Y
4 8
2 4
8 18
6 22
10 30
6 8
Table 13.3

Regression equation: y^i=−3.6+3.1Xiy^i=−3.6+3.1Xi

What is your estimate of the average height of all trees having a trunk diameter of 7 inches?

23.

The manufacturers of a chemical used in flea collars claim that under standard test conditions each additional unit of the chemical will bring about a reduction of 5 fleas (i.e. where Xj=amount of chemicalXj=amount of chemical and YJ=B0+B1XJ+EJYJ=B0+B1XJ+EJ, H0H0: B1=−5B1=−5

Suppose that a test has been conducted and results from a computer include:

Intercept = 60

Slope = −4

Standard error of the regression coefficient = 1.0

Degrees of Freedom for Error = 2000

95% Confidence Interval for the slope −2.04, −5.96

Is this evidence consistent with the claim that the number of fleas is reduced at a rate of 5 fleas per unit chemical?

13.6 Predicting with a Regression Equation

24.

True or False? If False, correct it: Suppose you are performing a simple linear regression of Y on X and you test the hypothesis that the slope β is zero against a two-sided alternative. You have n=25n=25 observations and your computed test (t) statistic is 2.6. Then your P-value is given by .01 < P < .02, which gives borderline significance (i.e. you would reject H0H0 at α=.02α=.02 but fail to reject H0H0 at α=.01α=.01).

25.

An economist is interested in the possible influence of "Miracle Wheat" on the average yield of wheat in a district. To do so he fits a linear regression of average yield per year against year after introduction of "Miracle Wheat" for a ten year period.

The fitted trend line is

y^j=80+1.5Xjy^j=80+1.5Xj

(YjYj: Average yield in j year after introduction)

(XjXj: j year after introduction).

  1. What is the estimated average yield for the fourth year after introduction?
  2. Do you want to use this trend line to estimate yield for, say, 20 years after introduction? Why? What would your estimate be?
26.

An interpretation of r=0.5r=0.5 is that the following part of the Y-variation is associated with which variation in X:

  1. most
  2. half
  3. very little
  4. one quarter
  5. none of these
27.

Which of the following values of r indicates the most accurate prediction of one variable from another?

  1. r=1.18r=1.18
  2. r=−.77r=−.77
  3. r=.68r=.68

13.7 How to Use Microsoft Excel® for Regression Analysis

28.

A computer program for multiple regression has been used to fit y^j=b0+b1X1j+b2X2j+b3X3jy^j=b0+b1X1j+b2X2j+b3X3j.

Part of the computer output includes:

i bibi SbiSbi
0 8 1.6
1 2.2 .24
2 -.72 .32
3 0.005 0.002
Table 13.4
  1. Calculation of confidence interval for b2b2 consists of _______± (a student's t value) (_______)
  2. The confidence level for this interval is reflected in the value used for _______.
  3. The degrees of freedom available for estimating the variance are directly concerned with the value used for _______
29.

An investigator has used a multiple regression program on 20 data points to obtain a regression equation with 3 variables. Part of the computer output is:

Variable Coefficient Standard Error of bibi
1 0.45 0.21
2 0.80 0.10
3 3.10 0.86
Table 13.5
  1. 0.80 is an estimate of ___________.
  2. 0.10 is an estimate of ___________.
  3. Assuming the responses satisfy the normality assumption, we can be 95% confident that the value of β2β2 is in the interval,_______ ± [t.025 ⋅ _______], where t.025 is the critical value of the student's t distribution with ____ degrees of freedom.
Citation/Attribution

Want to cite, share, or modify this book? This book is Creative Commons Attribution License 4.0 and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/introductory-business-statistics/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/introductory-business-statistics/pages/1-introduction
Citation information

© Nov 29, 2017 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License 4.0 license. The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.