A measure of the degree to which variation of one variable is related to variation in one or more other variables. The most commonly used correlation coefficient indicates the degree to which variation in one variable is described by a straight line relation with another variable.
Suppose that sample information is available on family income and Years of schooling of the head of the household. A correlation coefficient = 0 would indicate no linear association at all between these two variables. A correlation of 1 would indicate perfect linear association (where all variation in family income could be associated with schooling and vice versa).
Definition:
A t test is obtained by dividing a regression coefficient by its standard error and then comparing the result to critical values for Students' t with Error df. It provides a test of the claim that when all other variables have been included in the relevant regression model.
Example:
Suppose that 4 variables are suspected of influencing some response. Suppose that the results of fitting include:
Variable | Regression coefficient | Standard error of regular coefficient |
.5 | 1 | -3 |
.4 | 2 | +2 |
.02 | 3 | +1 |
.6 | 4 | -.5 |
t calculated for variables 1, 2, and 3 would be 5 or larger in absolute value while that for variable 4 would be less than 1. For most significance levels, the hypothesis would be rejected. But, notice that this is for the case when , , and have been included in the regression. For most significance levels, the hypothesis would be continued (retained) for the case where , , and are in the regression. Often this pattern of results will result in computing another regression involving only , , , and examination of the t ratios produced for that case.
Some variables seem to be related, so that knowing one variable's status allows us to predict the status of the other. This relationship can be measured and is called correlation. However, a high correlation between two variables in no way proves that a cause-and-effect relation exists between them. It is entirely possible that a third factor causes both variables to vary together.
The precision of the estimate of the Y variable depends on the range of the independent (X) variable explored. If we explore a very small range of the X variable, we won't be able to make much use of the regression. Also, extrapolation is not recommended.
Most simply, since −5 is included in the confidence interval for the slope, we can conclude that the evidence is consistent with the claim at the 95% confidence level.
Using a t test:
:
:
Since < we retain the null hypothesis that .
True.
t(critical, df = 23, two-tailed, α = .02) = ± 2.5
tcritical, df = 23, two-tailed, α = .01 = ± 2.8
- No. Most business statisticians would not want to extrapolate that far. If someone did, the estimate would be 110, but some other factors probably come into play with 20 years.
- The population value for , the change that occurs in Y with a unit change in , when the other variables are held constant.
- The population value for the standard error of the distribution of estimates of .
- .8, .1, 16 = 20 − 4.