Fill This Form To Receive Instant Help
Homework answers / question archive / Linear Regression 6
Linear Regression 6.2
The goal is to test the functional dependence (prediction) of FVC on elderly subjects based on their height.
To test the regression assumption that there is a linear relationship between height and FVC and that the residuals in FVC are normally distributed and have equal variance along the linear relationship with height, we examined a histogram, a P-P plot of Regression Standardized residual, and a scatterplot. All graphs showed a normally distributed linear relationship.
Further, a linear regression analysis was performed to test whether the proportion of variance in FVC is statistically significant. It was determined that the proportion of variance in the FVC was significant (F= 699.4, df = 1, p<.001) and that the correlation of the height variable on FVC had a predictive ability.
Table 3. ANOVA for the Regression Equation, Height (cm) on Forced Vital Capacity (L)
Table 3 shows the computed proportion of variance in FVC, in which the residual sum of squares (328.90) was subtracted from its total variability (617.18) giving the amount of predicted variance (288.28). The FVC predicted variance divided by the total variability gives a coefficient of determination. The F ratio (699.43) provides the test of statistical significance. The results show that FVC varies significantly with height.
|
Sum of Squares |
df |
Mean Square |
F |
Regression |
288.28 |
1 |
288.28 |
699.43** |
Residual |
328.90 |
798 |
.41 |
|
Total |
617.18 |
799 |
|
|
** p < 0.01
The regression equation (y = b0 + b1X + e) is the linear equation used to fit the best straight line to the data. FVC is the dependent variable (y), and as shown in Table 4, can be expressed as a function of a constant (b0). Therefore, y = -7.193 + 0.062x + .002, (where x = height). The predicted slope is .062, and the 95% CI for the slope is .057 to .066. The 95% CI, which provides the lower and upper bounds for the unstandardized regression coefficient, does not include 0, suggesting that the slope is significantly different than 0, meaning that there is a linear relationship between height and FVC.
Table 4. Regression Coefficients
|
B |
Std. Error |
|
|
Lower Bound |
Upper Bound |
|
Unstandardized Coefficients |
t |
Sig. |
95% Confidence Interval for B |
||
(Constant) |
-7.19 |
.385 |
-18.7 |
.000 |
-7.94 |
-6.43 |
Height (cm) |
.062 |
.002 |
26.4 |
.000 |
.057 |
.066 |
The 95% CI can be explained further with an example prediction using the regression equation for a height equal to 160cm:
y = b0 + b1X + e
= -7.19 + .062(160) + .002
= -7.193 + 9.92 + .002
y = 2.73
95% CI for Height coefficient = unstandardized coefficient (Height) ± (1.96 x Std. Error)
= 2.73 + or - (1.96 x .642)
= 2.73 + or - 1.26
95% CI for Height coefficient = 1.47 to 4.0
The predicted slope is 2.73, and the 95% CI for the slope is 1.47 to 4.0. Again, the 95% CI does not include 0, suggesting that the slope is significantly different than 0, meaning that there is a linear relationship between height and FVC.
As shown in Table 5, excluding the outliers improves the model. Noticeably, r-squared is higher (meaning that variation in height explains more of the variation in FVC), the standard error is lower, and the slope is farther away from 0 (although in both cases, the slope is significantly different than 0).
The most important factor to note is the increase in the r-squared value. The model with the outliers explains about 47% of the variation in FVC, and the model without the outliers explains about 52% of variation in FVC.
Table 5. Comparison of Model Summary.
Table 5 shows a comparison of the regression coefficients r, r-squared, adjusted r-squared, standard error of the estimates, and regression equations for Outliers included and excluded in the data.
|
Outliers Included |
Outliers Excluded |
R |
.683 |
.723 |
R squared |
.467 |
.523 |
Adj. R squared |
.466 |
.522 |
Std. Error |
.642 |
.599 |
Regression equation |
Y= -7.193 + .062x |
Y= -7.788+.065x |
You need to change the 95% CI for the predicted value. Also, look at my discussion at the end of the file about the residuals.
The 95% CI can be explained further with an example prediction using the regression equation for a height equal to 160cm:
y = b0 + b1X + e
= -7.19 + .062(160) + .002 (again, don't include the error)
= -7.193 + 9.92 + .002
y = 2.73
95% CI for Height coefficient = unstandardized coefficient (Height)  (1.96 x Std. Error)
This is not a CI for the coefficient, it is a CI for the estimated value of y when x = 160. Because the regression equation is not expected to exactly predict y (the data points tend to follow the line, but are actually scattered around it), you are finding a CI for y centered at the point predicted by the regression line.
= 2.73 + or - (1.96 x .642)
= 2.73 + or - 1.26
95% CI for Height coefficient = 1.47 to 4.0 (do this again, but use the value for the height calculated without adding 0.002; otherwise, it is correct)
The predicted slope is 2.73, and the 95% CI for the slope is 1.47 to 4.0. Again, the 95% CI does not include 0, suggesting that the slope is significantly different than 0, meaning that there is a linear relationship between height and FVC. The CI won't say anything about the slope, just the values expected for y when x = 160.
please see the attached file.