Fill This Form To Receive Instant Help
Homework answers / question archive / Please read the following notes before attempting the coursework: This coursework accounts for 100% of the element
Please read the following notes before attempting the coursework:
This coursework accounts for 100% of the element. You can achieve maximum 100 marks. Maximum 10 marks will be awarded for the structure of your report; presentation of the regression analysis output, clearly labelled graphs and your R skills.
Your Report: Attempt all questions in all Two parts. Complete this work in R and write a report using either LaTeX or Word document. Use an appropriate referencing style to cite where it is required. Be sure to lay out your script file clearly and carefully, so that each 'Individual question can be clearly identified, and so that the when marking your coursework it can be understood what you have done.
Avoid copying all your R code in your report. You should clearly refer to them in your text, e.g. see Task 1 part (a).
Further information on how to write about a regression analysis is provide in Chapter 6 of your lecture note, Guidelines on Reporting a Regression.
Your Submission: Your submission should include two files: a report and R script files. The page limit for your report is 8 pages including all printouts and appendices.
Part 1: Plasma ferritin concentration study
In this coursework you will assess the effect of a collection of explanatory variables on the plasma ferritin concentration (Ferr) in 202 Australian athletes. The file "Sports Data CW.csv" contains the data on the plasma ferritin concentration as well as a selection of demographic variables of 202 male and female athletes. In particular, the data set comprises observations on the following eleven variables:
Variable Description Sport Types of sport Sex male or female LBM Lean body mass RCC Red cell count WCC White cell count Hc (%) Hematocrit (Hc) is the volume percentage (vol%) of red blood cells in blood. It is normally 47% ±5% for men and 42% ±5% for women.[1] Hg(g/dl) BMI SSF (mm) Hemoglobin (Hg) is the protein contained in red blood cells that is responsible for delivery of oxygen to the tissues. The normal Hg level for males is 14 to 18 g/dl; that for females is 12 to 16 g/dl. Body mass index = weight/height^2 Sum of skin folds % Bfat Ferr (pmol/L) % body fat Plasma ferritin concentration
Task 1: Using R, read the data into a data frame called e.g. AtAlletes and:
(a) Produce a table of summary statistics and draw appropriate plots to visually investigate the relationship between these eleven variables. Comment on your table and plots.
(10 marks)
(b) Draw a histogram and Q-Q plot of Fen, and comment on them. Is the distribution of Ferr close to the normal distribution?
(4 marks)
Randomly divide the dataset into two sets, training (n1 = 141) and testing (n2 = 61) (see Appendix 1 for explanation how to do this).
Task 2: Use the training dataset to
(a) Write down the equation of a regression model with Ferr as the response and other ten variables as predictors.
(2 marks)
(b) Fit the model in (a), identify insignificant predictors and remove them from the model. Is a full model better than a smaller model? Use appropriate test or score to support your argument.
(c)
(10 marks)
Check the constant variance, independence and normality assumptions of the errors for the model in part (b). Do these assumptions hold for your model? If not, choose an appropriate transformation of the response variable and repeat steps (a)-(c)(i) for the transformed response variable.
(12 marks)
ii Check for outliers, large leverage and influential points. How would you deal with any possible outliers, large leverage or influential points?
(4 marks)
) For the model obtained in part (c), determine which of the significant predictors has the largest estimated effect on Fer L. Is this effect also the most statistically significant? Interpret the effect of significant variables on the . (10 marks)
Check the constant variance, independence and normality assumptions of the errors for the model in part (b). Do these assumptions hold for your model? If not, choose an appropriate transformation of the response variable and repeat steps (a)-(c)(i) for the transformed response variable.
(12 marks)
ii. Check for outliers, large leverage and influential points. How would you deal with any possible outliers, large leverage or influential points?
(4 marks)
(d) For the model obtained in part (c), determine which of the significant predictors has the largest estimated effect on Fe_,. Is this effect also the most statistically significant? Interpret the effect of significant variables on the ?err.
(10 marks)
Task 3: Model evaluation Use the testing dataset to evaluate your model by predicting the _`er r in the testing subset (see how to do this in Appendix 2). Using appropriate plots or statistical tests show whether predictions are close to the observed Fer, in the testing set.
(8 marks)
In both parts 2 and 3 use the significance level of 0.05.
Part 2: Bayesian Inference
5
An unknown value 0 has been transmitted to you over a noisy channel. Let us assume the noise is normally distributed with mean 0 and a known variance 4. So the value x that you receive is modelled by N(6, 4). Based on previous communications, your prior knowledge on 6 is N(12, 9).
(a) Suppose a value is transmitted to you and you receive it as x = 13.25. Obtain the posterior distribution function for 0.
(b) On a same plot in R, draw the prior, likelihood and posterior distribution curves. Explain how your belief about theta will be updated after adding information from the data to the prior information.
(c) Suppose the same value 0 is transmitted to you it times. You receive these signals plus noise as x„ with sample mean x. Assuming 0 N(00, ah, obtain a formula for the posterior mean and variance of the mean parameter.
(d) Suppose the same value 0 is sent to you 20 times. You receive these signals plus noise as x1, ...,x20 with sample mean x = 11.85. Using the same prior and known variance cr2 as in part (a), obtain the posterior distribution for 0. Plot the prior, likelihood and posterior on the same graph. Describe how the data changes your belief about the true value of 0.
(e) How do the posterior mean and variance change if more data is received? What is gained by sending the same signal multiple times? Answer this question by addressing the formulae you obtain in part (c).
Total: 30 marks
Please download the answer file using this link
https://drive.google.com/file/d/1qGDNhk2FwO9r8Y_KVGDxxe8XeOu56Wui/view?usp=sharing