Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo
Principles of Finance

14.2 Linear Regression Analysis

Principles of Finance14.2 Linear Regression Analysis

Menu
Table of contents
  1. Preface
  2. 1 Introduction to Finance
    1. Why It Matters
    2. 1.1 What Is Finance?
    3. 1.2 The Role of Finance in an Organization
    4. 1.3 Importance of Data and Technology
    5. 1.4 Careers in Finance
    6. 1.5 Markets and Participants
    7. 1.6 Microeconomic and Macroeconomic Matters
    8. 1.7 Financial Instruments
    9. 1.8 Concepts of Time and Value
    10. Summary
    11. Key Terms
    12. Multiple Choice
    13. Review Questions
    14. Video Activity
  3. 2 Corporate Structure and Governance
    1. Why It Matters
    2. 2.1 Business Structures
    3. 2.2 Relationship between Shareholders and Company Management
    4. 2.3 Role of the Board of Directors
    5. 2.4 Agency Issues: Shareholders and Corporate Boards
    6. 2.5 Interacting with Investors, Intermediaries, and Other Market Participants
    7. 2.6 Companies in Domestic and Global Markets
    8. Summary
    9. Key Terms
    10. CFA Institute
    11. Multiple Choice
    12. Review Questions
    13. Video Activity
  4. 3 Economic Foundations: Money and Rates
    1. Why It Matters
    2. 3.1 Microeconomics
    3. 3.2 Macroeconomics
    4. 3.3 Business Cycles and Economic Activity
    5. 3.4 Interest Rates
    6. 3.5 Foreign Exchange Rates
    7. 3.6 Sources and Characteristics of Economic Data
    8. Summary
    9. Key Terms
    10. CFA Institute
    11. Multiple Choice
    12. Review Questions
    13. Problems
    14. Video Activity
  5. 4 Accrual Accounting Process
    1. Why It Matters
    2. 4.1 Cash versus Accrual Accounting
    3. 4.2 Economic Basis for Accrual Accounting
    4. 4.3 How Does a Company Recognize a Sale and an Expense?
    5. 4.4 When Should a Company Capitalize or Expense an Item?
    6. 4.5 What Is “Profit” versus “Loss” for the Company?
    7. Summary
    8. Key Terms
    9. Multiple Choice
    10. Review Questions
    11. Problems
    12. Video Activity
  6. 5 Financial Statements
    1. Why It Matters
    2. 5.1 The Income Statement
    3. 5.2 The Balance Sheet
    4. 5.3 The Relationship between the Balance Sheet and the Income Statement
    5. 5.4 The Statement of Owner’s Equity
    6. 5.5 The Statement of Cash Flows
    7. 5.6 Operating Cash Flow and Free Cash Flow to the Firm (FCFF)
    8. 5.7 Common-Size Statements
    9. 5.8 Reporting Financial Activity
    10. Summary
    11. Key Terms
    12. CFA Institute
    13. Multiple Choice
    14. Review Questions
    15. Problems
    16. Video Activity
  7. 6 Measures of Financial Health
    1. Why It Matters
    2. 6.1 Ratios: Condensing Information into Smaller Pieces
    3. 6.2 Operating Efficiency Ratios
    4. 6.3 Liquidity Ratios
    5. 6.4 Solvency Ratios
    6. 6.5 Market Value Ratios
    7. 6.6 Profitability Ratios and the DuPont Method
    8. Summary
    9. Key Terms
    10. CFA Institute
    11. Multiple Choice
    12. Review Questions
    13. Problems
    14. Video Activity
  8. 7 Time Value of Money I: Single Payment Value
    1. Why It Matters
    2. 7.1 Now versus Later Concepts
    3. 7.2 Time Value of Money (TVM) Basics
    4. 7.3 Methods for Solving Time Value of Money Problems
    5. 7.4 Applications of TVM in Finance
    6. Summary
    7. Key Terms
    8. CFA Institute
    9. Multiple Choice
    10. Review Questions
    11. Problems
    12. Video Activity
  9. 8 Time Value of Money II: Equal Multiple Payments
    1. Why It Matters
    2. 8.1 Perpetuities
    3. 8.2 Annuities
    4. 8.3 Loan Amortization
    5. 8.4 Stated versus Effective Rates
    6. 8.5 Equal Payments with a Financial Calculator and Excel
    7. Summary
    8. Key Terms
    9. CFA Institute
    10. Multiple Choice
    11. Problems
    12. Video Activity
  10. 9 Time Value of Money III: Unequal Multiple Payment Values
    1. Why It Matters
    2. 9.1 Timing of Cash Flows
    3. 9.2 Unequal Payments Using a Financial Calculator or Microsoft Excel
    4. Summary
    5. Key Terms
    6. CFA Institute
    7. Multiple Choice
    8. Review Questions
    9. Problems
    10. Video Activity
  11. 10 Bonds and Bond Valuation
    1. Why It Matters
    2. 10.1 Characteristics of Bonds
    3. 10.2 Bond Valuation
    4. 10.3 Using the Yield Curve
    5. 10.4 Risks of Interest Rates and Default
    6. 10.5 Using Spreadsheets to Solve Bond Problems
    7. Summary
    8. Key Terms
    9. CFA Institute
    10. Multiple Choice
    11. Review Questions
    12. Problems
    13. Video Activity
  12. 11 Stocks and Stock Valuation
    1. Why It Matters
    2. 11.1 Multiple Approaches to Stock Valuation
    3. 11.2 Dividend Discount Models (DDMs)
    4. 11.3 Discounted Cash Flow (DCF) Model
    5. 11.4 Preferred Stock
    6. 11.5 Efficient Markets
    7. Summary
    8. Key Terms
    9. CFA Institute
    10. Multiple Choice
    11. Review Questions
    12. Problems
    13. Video Activity
  13. 12 Historical Performance of US Markets
    1. Why It Matters
    2. 12.1 Overview of US Financial Markets
    3. 12.2 Historical Picture of Inflation
    4. 12.3 Historical Picture of Returns to Bonds
    5. 12.4 Historical Picture of Returns to Stocks
    6. Summary
    7. Key Terms
    8. Multiple Choice
    9. Review Questions
    10. Video Activity
  14. 13 Statistical Analysis in Finance
    1. Why It Matters
    2. 13.1 Measures of Center
    3. 13.2 Measures of Spread
    4. 13.3 Measures of Position
    5. 13.4 Statistical Distributions
    6. 13.5 Probability Distributions
    7. 13.6 Data Visualization and Graphical Displays
    8. 13.7 The R Statistical Analysis Tool
    9. Summary
    10. Key Terms
    11. CFA Institute
    12. Multiple Choice
    13. Review Questions
    14. Problems
    15. Video Activity
  15. 14 Regression Analysis in Finance
    1. Why It Matters
    2. 14.1 Correlation Analysis
    3. 14.2 Linear Regression Analysis
    4. 14.3 Best-Fit Linear Model
    5. 14.4 Regression Applications in Finance
    6. 14.5 Predictions and Prediction Intervals
    7. 14.6 Use of R Statistical Analysis Tool for Regression Analysis
    8. Summary
    9. Key Terms
    10. Multiple Choice
    11. Review Questions
    12. Problems
    13. Video Activity
  16. 15 How to Think about Investing
    1. Why It Matters
    2. 15.1 Risk and Return to an Individual Asset
    3. 15.2 Risk and Return to Multiple Assets
    4. 15.3 The Capital Asset Pricing Model (CAPM)
    5. 15.4 Applications in Performance Measurement
    6. 15.5 Using Excel to Make Investment Decisions
    7. Summary
    8. Key Terms
    9. CFA Institute
    10. Multiple Choice
    11. Review Questions
    12. Problems
    13. Video Activity
  17. 16 How Companies Think about Investing
    1. Why It Matters
    2. 16.1 Payback Period Method
    3. 16.2 Net Present Value (NPV) Method
    4. 16.3 Internal Rate of Return (IRR) Method
    5. 16.4 Alternative Methods
    6. 16.5 Choosing between Projects
    7. 16.6 Using Excel to Make Company Investment Decisions
    8. Summary
    9. Key Terms
    10. CFA Institute
    11. Multiple Choice
    12. Review Questions
    13. Problems
    14. Video Activity
  18. 17 How Firms Raise Capital
    1. Why It Matters
    2. 17.1 The Concept of Capital Structure
    3. 17.2 The Costs of Debt and Equity Capital
    4. 17.3 Calculating the Weighted Average Cost of Capital
    5. 17.4 Capital Structure Choices
    6. 17.5 Optimal Capital Structure
    7. 17.6 Alternative Sources of Funds
    8. Summary
    9. Key Terms
    10. CFA Institute
    11. Multiple Choice
    12. Review Questions
    13. Problems
    14. Video Activity
  19. 18 Financial Forecasting
    1. Why It Matters
    2. 18.1 The Importance of Forecasting
    3. 18.2 Forecasting Sales
    4. 18.3 Pro Forma Financials
    5. 18.4 Generating the Complete Forecast
    6. 18.5 Forecasting Cash Flow and Assessing the Value of Growth
    7. 18.6 Using Excel to Create the Long-Term Forecast
    8. Summary
    9. Key Terms
    10. Multiple Choice
    11. Review Questions
    12. Problems
    13. Video Activity
  20. 19 The Importance of Trade Credit and Working Capital in Planning
    1. Why It Matters
    2. 19.1 What Is Working Capital?
    3. 19.2 What Is Trade Credit?
    4. 19.3 Cash Management
    5. 19.4 Receivables Management
    6. 19.5 Inventory Management
    7. 19.6 Using Excel to Create the Short-Term Plan
    8. Summary
    9. Key Terms
    10. Multiple Choice
    11. Review Questions
    12. Video Activity
  21. 20 Risk Management and the Financial Manager
    1. Why It Matters
    2. 20.1 The Importance of Risk Management
    3. 20.2 Commodity Price Risk
    4. 20.3 Exchange Rates and Risk
    5. 20.4 Interest Rate Risk
    6. Summary
    7. Key Terms
    8. CFA Institute
    9. Multiple Choice
    10. Review Questions
    11. Problems
    12. Video Activity
  22. Index

Learning Outcomes

By the end of this section, you will be able to:

  • Analyze a regression using the method of least squares and residuals.
  • Test the assumptions for linear regression.

Method of Least Squares and Residuals

Once the correlation coefficient has been calculated and a determination has been made that the correlation is significant, typically a regression model is then developed. In this discussion we will focus on linear regression, where a straight line is used to model the relationship between the two variables. Once a straight-line model is developed, this model can then be used to predict the value of the dependent variable for a specific value of the independent variable.

Recall from algebra that the equation of a straight line is given by

y=mx+by=mx+b
14.2

where m is the slope of the line and b is the y-intercept of the line.

The slope measures the steepness of the line, and the y-intercept is that point on the y-axis where the graph crosses, or intercepts, the y-axis.

In linear regression analysis, the equation of the straight line is written in a slightly different way using the model

y^=a+bxy^=a+bx
14.3

In this format, b is the slope of the line, and a is the y-intercept. The notation y^y^ is called y-hat and is used to indicate a predicted value of the dependent variable y for a certain value of the independent variable x.

If a line extends uphill from left to right, the slope is a positive value, and if the line extends downhill from left to right, the slope is a negative value. Refer to Figure 14.3.

Three possible line graphs for the equation ŷ = a + bx, labeled (a), (b), and (c) respectively. Graph (a) shows a line sloping upward to the right, if b > 0. Graph (b) shows a horizontal line, if b = 0. Graph (c) shows a line sloping downward to the right, if b < 0.
Figure 14.3 Three Possible Graphs of y ^ = a + b x   y ^ = a + b x   (a) If b>0b>0, the line slopes upward to the right. (b) If b=0b=0, the line is horizontal. (c) If b<0b<0, the line slopes downward to the right.

When generating the equation of a line in algebra using y=mx+by=mx+b, two (x, y) points were required to generate the equation. However, in regression analysis, all (x, y) points in the data set will be utilized to develop the linear regression model.

The first step in any regression analysis is to create the scatter plot. Then proceed to calculate the correlation coefficient r, and check this value for significance. If we think that the points show a linear relationship, we would like to draw a line on the scatter plot. This line can be calculated through a process called linear regression. However, we only calculate a regression line if one of the variables helps to explain or predict the other variable. If x is the independent variable and y the dependent variable, then we can use a regression line to predict y for a given value of x.

As an example of a regression equation, assume that a correlation exists between the monthly amount spent on advertising and the monthly revenue for a Fortune 500 company. After collecting (x, y) data for a certain time period, the company determines the regression equation is of the form

y^=9,376.7+61.8xy^=9,376.7+61.8x
14.4

where x represents the monthly amount spent on advertising (in thousands of dollars) and y^y^ represents the monthly revenues for the company (in thousands of dollars).

A scatter plot of the (x, y) data is shown in Figure 14.4.

A scatter plot helps a Fortune 500 company predict its monthly revenue, depending on level of advertising spend. The diagram shows its revenue increasing from approximately $12,000,000 to $19,000,000, as advertising spend increases from approximately $50,000 to $150,000.
Figure 14.4 Scatter Plot of Revenue versus Advertising for a Fortune 500 Company ($000s)

The Fortune 500 company would like to predict the monthly revenue if its executives decide to spend $150,000 in advertising next month. To determine the estimate of monthly revenue, let x=150x=150 in the regression equation and calculate a corresponding value for y^y^:

y^ = 9,376.7+61.8xy^=9,376.7+61.8150y^=18,646.7y^ = 9,376.7+61.8xy^=9,376.7+61.8150y^=18,646.7
14.5

This predicted value of y indicates that the anticipated revenue would be $18,646,700, given the advertising spend of $150,000.

Notice that from past data, there may have been a month where the company actually did spend $150,000 on advertising, and thus the company may have an actual result for the monthly revenue. This actual, or observed, amount can be compared to the prediction from the linear regression model to calculate a residual.

A residual is the difference between an observed y-value and the predicted y-value obtained from the linear regression equation. As an example, assume that in a previous month, the actual monthly revenue for an advertising spend of $150,000 was $19,200,000, and thus y=19,200y=19,200. The residual for this data point can be calculated as follows:

Residual=observed y-value-predicted y-valueResidual=y-y^Residual=19,200-18,646.7=553.3Residual=observed y-value-predicted y-valueResidual=y-y^Residual=19,200-18,646.7=553.3
14.6

Notice that residuals can be positive, negative, or zero. If the observed y-value exactly matches the predicted y-value, then the residual will be zero. If the observed y-value is greater than the predicted y-value, then the residual will be a positive value. If the observed y-value is less than the predicted y-value, then the residual will be a negative value.

When formulating the linear regression line of best fit to the points on the scatter plot, the mathematical analysis generates a linear equation where the sum of the squared residuals is minimized. This analysis is referred to as the method of least squares. The result is that the analysis generates a linear equation that is the “best fit” to the points on the scatter plot, in the sense that the line minimizes the differences between the predicted values and observed values for y.

Think It Through

Calculating a Residual

Suppose that the chief financial officer of a corporation has created a linear model for the relationship between the company stock and interest rates. When interest rates are at 5%, the company stock has a value of $94. Using the linear model, when interest rates are at 5%, the model predicts the value of the company stock to be $99. Calculate the residual for this data point.

The goal in the regression analysis is to determine the coefficients a and b in the following regression equation:

y^=a+bxy^=a+bx
14.8

Once the (x, y) has been collected, the slope (b) and y-intercept (a) can be calculated using the following formulas:

b =nxy - xynx2 - x2a = yn - bxnb =nxy - xynx2 - x2a = yn - bxn
14.9

where n refers to the number of data pairs and xx indicates sum of the x-values.

Notice that the formula for the y-intercept requires the use of the slope result (b), and thus the slope should be calculated first and the y-intercept should be calculated second.

When making predictions for y, it is always important to plot a scatter diagram first. If the scatter plot indicates that there is a linear relationship between the variables, then it is reasonable to use a best-fit line to make predictions for y, given x within the domain of x-values in the sample data, but not necessarily for x-values outside that domain.

Note: Computer spreadsheets, statistical software, and many calculators can quickly calculate the best-fit line and create the graphs. The calculations tend to be tedious if done by hand.

Assumptions for Linear Regression

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between x and y in the sample data provides strong enough evidence that we can conclude that there is a linear relationship between x and y in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population (Figure 14.5). Examining the scatter plot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

These are the assumptions underlying the test of significance:

  1. There is a linear relationship in the population that models the average value of y for varying values of x. In other words, the expected value of y for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
  2. The y-values for any particular x-value are normally distributed about the line. This implies that there are more y-values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y-values lie on the line.
  3. The standard deviations of the population y-values about the line are equal for each value of x. In other words, each of these normal distributions of y-values has the same shape and spread about the line.
  4. The residual errors are mutually independent (no pattern).
  5. The data are produced from a well-designed, random sample or randomized experiment.
Two diagrams of a best-fit line. The first diagram (a) shows a linearly descending line running through the center of three vertical sets of scattered points. The second diagram (b) shows a linearly descending line running through the mean of three tilted bell curves. The bottom of each bell curve aligns with the position of the three vertical scattered points in diagram a.
Figure 14.5 Best-Fit Line The y-values for each x-value are normally distributed about the line with the same standard deviation. For each x-value, the mean of the y-values lies on the regression line. More y-values lie near the line than are scattered further away from the line.
Order a print copy

As an Amazon Associate we earn from qualifying purchases.

Citation/Attribution

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-finance/pages/1-why-it-matters
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-finance/pages/1-why-it-matters
Citation information

© Jun 8, 2023 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.