STAT6031 Homework 5 Fall 2016
Attach to front of homework paper
Student name:
Reading: Chapter 3, Chapter 4
Reminder:
DO NOT hand in unedited SAS output/code
Statistics
Share With
STAT6031 Homework 5 Fall 2016
Attach to front of homework paper
Student name:
Reading: Chapter 3, Chapter 4
Reminder:
DO NOT hand in unedited SAS output/code.
ONLY include the results required
Answer questions IN ORDER; Include question number; Label graphs and tables
Include SAS input for questions at the end of the entire homework as Appendix
SAVE your SAS code, since you may be asked to continue problems on successive homeworks
Assigned problems:
Consider the following data set that describes the relationship between the rate of an enzymatic reaction (V ) and the substrate concentration (C). A common model used to describe the relationship between rate and concentration is the Michaelis-Menten model, where θ1 is the maximum rate of the reaction and θ2 describes how quickly the reaction will reach its maximum rate. With this mode, can be written as a linear model with explanatory variable C1 :
Concentration
Rate
Concentration
Rate
0.02 0.02 0.06 0.06 0.11
0.11
76
47
96
106
123
139
0.22 0.22 0.56 0.56 1.10
1.10
159
152
191
200
207
201
Generate a scatterplot of V vs C. Does their relationship appear to be linear?
Define new variables for V1 and C1 in SAS, and generate a scatterplot of the new variables. Does the fit appear linear? Do any assumptions appear to be violated? (Hint: is the variance constant?) The new variables can be defined as follows (if the dataset original contains the raw data):
data reaction; set original; vinv = 1/v; cinv = 1 /c;
How is the distribution of C1 different from the distribution of C? (report their respective mean, median standard deviation and range; check normality and symmetry using univariate procedure).
The error sums of squares (SSE) for this model was found to be 7. If there were n=16 observations, provide the best estimate for σ2.
Determine the least squares regression line for V1 vs C1 . Save the residuals and predicted values. Does the residual plot suggest any problems?
Use the “ grade point average” data in KNNL #1.19 (CH01PR19.TXT) for the following three questions.
Describe the distribution of the explanatory variable (report mean, median, standard deviation, range and extreme value; check normality and symmetry), using univariate procedure. Show the plots and output that were helpful in learning about this variable.
Run the linear regression to predict GPA from the entrance test score, and obtain the residuals (DO NOT include a list of the residuals in your solution).
Verify that the sum of the residuals is zero by running proc univariate with the output from the regression.
Plot the residuals versus the explanatory variable and briefly describe the plot noting any unusual patterns or points.
Plot the residuals versus the order in which the data appear in the data file. ( Hint :
define seq=n in data procedure and then plot resid*seq) Does the residual seem to be dependent on the order?
Examine the distribution of the residuals by getting a histogram and a normal probability plot of the residuals by using the histogram and qqplot statements in proc univariate. What do you conclude?
Change the data set by changing the value of the GPA for the last observation from 2.948 to 29.48 (e.g., a typo). You can do this in a data step. For example,
data a2; set a1; if n eq 120 then gpa = 29.48 ;
An alternative is simply to edit the data file.
Make a table comparing the results of this analysis with the results of the analysis of the original dat Include in the table the following:
fitted equation
t-test for the slope, with standard error and p-value
R2
the estimate of σ2
Summarize the differences.
Repeat parts (b), (c), and (d) from the Problem #3 and explain how these plots help you to detect the unusual observation.
Consider the following SAS output giving 5 confidence intervals for the mean of Y . If you wanted to guarantee that joint coverage of the five confidence intervals was at least 95 %, what confidence level would you use when forming each interval, using the Bonferroni correction? Compute this adjusted confidence interval for the mean of Y when X = 5. (Note that some observations have been omitted from the output.)
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F Model 1 16183 16183 805.62 <.0001
Error 16 321.39597 20.08725 Corrected Total 17 16504
Root MSE 4.48188 R-Square 0.9805 Dependent Mean 64.00000 Adj R-Sq 0.9793 Coeff Var 7.00294
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| Intercept 1 -2.32215 2.56435 -0.91 0.3786 x 1 14.73826 0.51926 28.38 <.0001
Output Statistics
Dep Var Predicted Std Error
Obs x y Value Mean Predict 95% CL Mean Residual 3 5 78.0000 71.3691 1.0878 69.0630 73.6752 6.6309