Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo
Statistics

Chapter Review

StatisticsChapter Review

12.1 Linear Equations

The most basic type of association is a linear association. This type of relationship can be defined algebraically by the equations used (numerically with actual or predicted data values) or graphically from a plotted curve. Lines are classified as straight curves. Algebraically, a linear equation typically takes the form y = mx + b, where m and b are constants, x is the independent variable, and y is the dependent variable. In a statistical context, a linear equation is written in the form y = a + bx, where a and b are the constants. This form is used to help you distinguish the statistical context from the algebraic context. In the equation y = a + bx, the constant b that multiplies the x variable (b is called a coefficient) is called the slope. The slope describes the rate of change between the independent and dependent variables; in other words, the rate of change describes the change that occurs in the dependent variable as the independent variable is changed. In the equation y = a + bx, the constant a is called the y-intercept. Graphically, the y-intercept is the y-coordinate of the point where the graph of the line crosses the y-axis. At this point, x = 0.

The slope of a line is a value that describes the rate of change between the independent and dependent variables. The slope tells us how the dependent variable (y) changes for every one-unit increase in the independent (x) variable, on average. The y-intercept is used to describe the dependent variable when the independent variable equals zero. Graphically, the slope is represented by three line types in elementary statistics.

12.2 The Regression Equation

A regression line, or a line of best fit, can be drawn on a scatter plot and used to predict outcomes for the x and y variables in a given data set or sample data. There are several ways to find a regression line, but usually the least-squares regression line is used because it creates a uniform line. Residuals, also called errors, measure the distance from the actual value of y and the estimated value of y. The sum of squared errors, or SSE, when set to its minimum, calculates the points on the line of best fit. Regression lines can be used to predict values within the given set of data but should not be used to make predictions for values outside the set of data.

The correlation coefficient, r, measures the strength of the linear association between x and y. The variable r has to be between –1 and +1. When r is positive, x and y tend to increase and decrease together. When r is negative, x increases and y decreases, or the opposite occurs: x decreases and y increases. The coefficient of determination, r2, is equal to the square of the correlation coefficient. When expressed as a percentage, r2 represents the percentage of variation in the dependent variable, y, that can be explained by variation in the independent variable, x, using the regression line.

12.3 Testing the Significance of the Correlation Coefficient (Optional)

Linear regression is a procedure for fitting a straight line of the form ŷ = a + bx to data. The conditions for regression are as follows:

  • Linear: In the population, there is a linear relationship that models the average value of y for different values of x.
  • Independent: The residuals are assumed to be independent.
  • Normal: The y values are distributed normally for any value of x.
  • Equal variance: The standard deviation of the y values is equal for each x value.
  • Random: The data are produced from a well-designed random sample or a randomized experiment.

The slope b and intercept a of the least-squares line estimate the slope β and intercept α of the population (true) regression line. To estimate the population standard deviation of y (σ) use the standard deviation of the residuals: s= SSE n2 s= SSE n2 . The variable ρ (rho) is the population correlation coefficient. To test the null hypothesis, H0: ρ = hypothesized value, use a linear regression t-test. The most common null hypothesis is H0: ρ = 0, which indicates there is no linear relationship between x and y in the population. The TI-83, 83+, 84, 84+ calculator function LinRegTTest can perform this test (STATS, TESTS, LinRegTTest).

12.4 Prediction (Optional)

After determining the presence of a strong correlation coefficient and calculating the line of best fit, you can use the least-squares regression line to make predictions about your data.

12.5 Outliers

To determine whether a point is an outlier, do one of the following:

  1. Input the following equations into the TI 83, 83+, 84, or 84+ calculator:


    y 1 =a+bx y 2 =a+bx+2s y 3 =a+bx2s y 1 =a+bx y 2 =a+bx+2s y 3 =a+bx2s

    where s is the standard deviation of the residuals.

    If any point is above y2 or below y3, then the point is considered to be an outlier.

  2. Use the residuals and compare their absolute values to 2s, where s is the standard deviation of the residuals. If the absolute value of any residual is greater than or equal to 2s, then the corresponding point is an outlier.
  3. Note: The calculator function LinRegTTest (STATS, TESTS, LinRegTTest) calculates s.
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/statistics/pages/1-introduction
Citation information

© Apr 16, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.