### 14.1 Correlation Analysis

Correlation is the measure of association between two numeric variables. A correlation coefficient called *r* is used to assess the strength and direction of the correlation. The value of *r* is always between $-1$ and $+1$. The size of the correlation *r* indicates the strength of the linear relationship between the two variables. Values of *r* close to $-1$ or to $+1$ indicate a stronger linear relationship. A positive value of *r* means that when *x* increases, *y* tends to increase and when *x* decreases, *y* tends to decrease (positive correlation). A negative value of *r* means that when *x* increases, *y *tends to decrease and when *x* decreases, *y* tends to increase (negative correlation).

### 14.2 Linear Regression Analysis

Linear regression analysis uses a straight-line fit to model the relationship between the two variables. Once a straight-line model is developed, this model can then be used to predict the value of the dependent variable for a specific value of the independent variable. Two parameters are calculated for the linear model, the slope of the best-fit line and the *y*-intercept of the best-fit line. The method of least squares is used to generate these parameters; this method is based on minimizing the squared differences between the predicted values and observed values for *y*.

### 14.3 Best-Fit Linear Model

Once a correlation has been deemed significant, a linear regression model is developed. The goal in the regression analysis is to determine the coefficients *a* and *b* in the following regression equation: $\hat{y}=a+bx$. Typically some technology, such as Excel, R statistical tool, or a calculator, is used to generate the coefficients *a* and *b* since manual calculations are cumbersome.

### 14.4 Regression Applications in Finance

Regression analysis is used extensively in finance-related applications. Many typical applications involve determining if there is a correlation between various stock market indices such as the S&P 500, the DJIA, and the Russell 2000 index. The procedure is to first generate a scatter plot to determine if a visual trend is observed, then calculate a correlation coefficient and check for significance. If the correlation coefficient is significant, a linear model can then be generated and used for predictions.

### 14.5 Predictions and Prediction Intervals

A key aspect of generating the linear regression model is to then use the model for predictions, provided that the correlation is significant. To generate predictions or forecasts using the linear regression model, substitute the value of the independent variable (*x*) in the regression equation and solve the equation for the dependent variable (*y*). When making predictions using the linear model, it is generally recommended to only predict values for *y* using values of *x* that are in the original range of the data collection.

### 14.6 Use of R Statistical Analysis Tool for Regression Analysis

R is an open-source statistical analysis tool that is widely used in the finance industry and can be found online. R provides an integrated suite of functions for data analysis, graphing, and correlation and regression analysis. R is increasingly being used as a data analysis and statistical tool because it is an open-source language and additional features are constantly being added by the user community. The tool can be used on many different computing platforms.