Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo
Principles of Finance

14.6 Use of R Statistical Analysis Tool for Regression Analysis

Principles of Finance14.6 Use of R Statistical Analysis Tool for Regression Analysis

Learning Outcomes

By the end of this section, you will be able to:

  • Generate correlation coefficients using the R statistical tool.
  • Generate linear regression models using the R statistical tool.

Generate Correlation Coefficients Using the R Statistical Tool

R is an open-source statistical analysis tool that is widely used in the finance industry. R is available as a free program and provides an integrated suite of functions for data analysis, graphing, and statistical programming. R provides many functions and capabilities for regression analysis.

Recall that most calculations in R are handled via functions.

The typical method for using functions in statistical applications is to first create a vector of data values. There are several ways to create vectors in R. For example, the c function is often used to combine values into a vector. For example, this R command will generate a vector called salaries, containing the data values 40,000, 50,000, 75,000, and 92,000:

      > salaries <- c(40000, 50000, 75000, 92000)

To calculate the correlation coefficient r, we use the R command called cor.

As an example, consider the data set in Table 14.8, which tracks the return on the S&P 500 versus return on Coca-Cola stock for a seven-month time period.


S&P 500


Return (%)



Return (%)

Jan 8 6
Feb 1 0
Mar 0 -2
Apr 2 1
May -3 -1
Jun 7 8
Jul 4 2
Table 14.8 Monthly Returns of Coca-Cola Stock versus Monthly Returns for the S&P 500

Create two vectors in R, one vector for the S&P 500 returns and a second vector for Coca-Cola returns:

      > SP500 <- c(8,1,0,2,-3,7,4)

      > CocaCola <- c(6,0,-2,1,-1,8,2)

The R command called cor returns the correlation coefficient for the x-data vector and y-data vector:

      > cor(SP500, CocaCola)

Generate Linear Regression Models Using the R Statistical Tool

To create a linear model in R, assuming the correlation is significant, the command lm (for linear model) will provide the slope and y-intercept for the linear regression equation.

The format of the R command is

      lm(dependent_variable_vector ~ independent_variable_vector)

Notice the use of the tilde symbol as the separator between the dependent variable vector and the independent variable vector.

We use the returns on Coca-Cola stock as the dependent variable and the returns on the S&P 500 as the independent variable, and thus the R command would be

      > lm(CocaCola ~ SP500)

      lm(formula = CocaCola ~ SP500)
      (Intercept)    SP500
       -0.3453    0.8641

The R output provides the value of the y-intercept as -0.3453-0.3453 and the value of the slope as 0.8641. Based on this, the linear model would be

y^ = a+bxy^ = -0.3453+0.8641xy^ = a+bxy^ = -0.3453+0.8641x

where x represents the value of S&P 500 return and y represents the value of Coca-Cola stock return.

The results can also be saved as a formula and called “model” using the following R command. To obtain more detailed results for the linear regression, the summary command can be used, as follows:

      > model <- lm(CocaCola ~ SP500)
      > summary(model)
      lm(formula = CocaCola ~ SP500)
        1    2    3    4    5    6    7
      -0.5672 -0.5188 -1.6547 -0.3828 1.9375 2.2969 -1.1109
           Estimate Std. Error t value Pr(>|t|)
      (Intercept) -0.3453   0.7836 -0.441 0.67783
      SP500     0.8641   0.1734  4.984 0.00416 **
      Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      Residual standard error: 1.658 on 5 degrees of freedom
      Multiple R-squared: 0.8325,  Adjusted R-squared: 0.7989
      F-statistic: 24.84 on 1 and 5 DF, p-value: 0.004161

In this output, the y-intercept and slope is given, as well as the residuals for each x-value. The output includes additional statistical details regarding the regression analysis.

Predicted values and prediction intervals can also be generated within R.

First, we can create a structure in R called a data frame to hold the values of the independent variable for which we want to generate a prediction. For example, we would like to generate the predicted return for Coca-Cola stock, given that the return for the S&P 500 is 6.

We use the R command called predict.

To generate a prediction for the linear regression equation called model, using the data frame where the value of the S&P 500 is 6, the R commands will be

      > a <- data.frame(SP500=6)
      > predict(model, a)

The output from the predict command indicates that the predicted return for Coca-Cola stock will be 4.8% when the return for the S&P 500 is 6%.

We can extend this analysis to generate a 95% prediction interval for this result by using the following R command, which adds an option to the predict command to generate a prediction interval:

      > predict(model,a, interval="predict")
         fit    lwr   upr
      1 4.839062 0.05417466 9.62395

Thus the 95% prediction interval for Coca-Cola return is (0.05%, 9.62%) when the return for the S&P 500 is 6%.

Order a print copy

As an Amazon Associate we earn from qualifying purchases.


This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at
Citation information

© Jan 8, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.