**
Fill This Form To Receive Instant Help**

Homework answers / question archive / STATS 330: Advanced Statistical Modelling Assignment 3 Semester 1, 2021 Total: 65 marks Due: 23:59, Tuesday 11 May 2021 Notes: (i) Write your assignment using R Markdown

STATS 330: Advanced Statistical Modelling Assignment 3 Semester 1, 2021 Total: 65 marks Due: 23:59, Tuesday 11 May 2021 Notes: (i) Write your assignment using R Markdown. Knit your report to either a Word or PDF document. (ii) Include all relevant code and output in the final document. (iii) 5 presentation marks are available. Please think of your markers - keep your code and plots neat and remember to check your spelling. (R has a built in spellchecker!). Note that 1 of these presentation marks will be given for the correct submission of your assignment. To gain this mark you must submit: • A signed cover sheet. • A Word or PDF document that contains your answers. • The R Markdown file used to create your answer document. 1. The dataset for this question concerns survival time for patients undergoing a particular type of liver operation. The dataset contains the following variables: bcs blood clotting score. pindex prognostic index. enzyme test enzyme function test score. liver test liver function test score. age age, in years. gender indicator variable for gender (0 = male, 1 = female). alc mod indicator variable for history of alcohol use (0 = None, 1 = Moderate). alc heavy indicator variable for history of alcohol use (0 = None, 1 = Heavy). y survival time in days. This data is included the the R package “olsrr.” To access the dataset: • install the olsrr package. • use library(olsrr) to access this package. • the data set will be in a data frame called surgical. 1 (a) [2 marks] Produce the output from str(surgical) and summary(surgical). Note that the variables gender, alc mod and alc heavy are factors. Since their levels have already been specified in the manner of indicator variables we don’t need to specify them as factors in R. However, when we come to interpret their coefficients for a fitted model we need to remember they are indicator variables rather than numeric variables. (b) [4 marks] Create a box plot for survival time. Comment on what this plot and the output from summary() indicate about the distribution of survival times. (c) [8 marks] Create a pairs() plot. What do these plots tell you about the relationship between survival time and the other variables? Also comment on relationships between the explanatory variables. (d) [8 marks] Fit the linear model for survival time y that uses all of the other variables in the dataset as explanatory variables. Include the output from summary() and look at the usual set of diagnostic plots. Do these plots indicate any problems with this model? (e) [5 marks] Now create a “gam” plot to investigate the relationships the each of the numeric explanatory variables and survival time. Use this plot to evaluate whether it is reasonable to model each of these relationships as being linear. (f) [12 marks] Now consider the possibility that using log(y) as the response will improve the model. Repeat parts (d) and (e) for the model that uses log(y) as the response. (g) [3 marks] Based on the above results is it more appropriate to use y or log(y) as the response? Explain your answer. (h) [5 marks] For the model you selected in (g) use dredge() to search for the best model according to the AICc criterion. i. Which explanatory variables are included in the best model from this search. ii. Are there any other models which are supported by the data almost as much as the best model? Explain you answer. iii. Based on the results of your search, divide the explanatory variables in three groups: (i) those that should definitely be included as an explanatory variable, (ii) those that possibly could be included and (iii) those that should not be included. (i) [3 marks] Repeat the model search using BIC as the model selection criterion. Does this change which model is identified as best? (j) [10 marks] For the best model from the search you did in (h), describe the impact that each of the explanatory variables has on the expected survival time. Note that you need to quantify the impact – simply stating that expected survival time increases or decreases is not a sufficient answer. 5 marks for presentation. 2

Already member? Sign In