Fill This Form To Receive Instant Help
Homework answers / question archive / MAS8404 Project Project brief In this project, you will analyse the BreastCancer data set which concerns characteristics of breast tissue samples collected from 699 women in Wisconsin using fine needle aspiration cytology (FNAC)
MAS8404 Project
Project brief
In this project, you will analyse the BreastCancer data set which concerns characteristics of breast tissue samples collected from 699 women in Wisconsin using fine needle aspiration cytology (FNAC). This is a type of biopsy procedure in which a thin needle is inserted into an area of abnormalappearing breast tissue. Nine easily-assessed cytological characteristics, such as uniformity of cell size and shape, were measured for each tissue sample on a one to ten scale. Smaller numbers indicate cells that looked healthier in terms of that characteristic. Further histological examination established whether each of the samples was benign or malignant. The objective of the clinical experiment was to determine the extent to which a tissue sample could be classified as benign or malignant using only the nine cytological characteristics.
For the purposes of this project, you may assume that the patients can be regarded as a random sample from the population of women experiencing symptoms of breast cancer.
The data set is part of the mlbench package. The package can be installed by typing into the console
> install.packages("mlbench")
It can then be loaded into R and inspected as follows:
> ## Load mlbench package
> library(mlbench)
> ## Load the data
> data(BreastCancer)
> ## Check size
> dim(BreastCancer)
[1] 699 11
> ## Print first few rows
> head(BreastCancer)
Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size Bare.nuclei
Bl.cromatin Normal.nucleoli Mitoses Class
More information on the variables can be found by typing ?BreastCancer in the console.
Your goal is to build a classifier for the Class – benign or malignant – of a tissue sample based on (at least some of) the nine cytological characteristics. It should be stressed that this is a real data set and there is no“correct”answer. Instead, what is required is evidence of an understanding of the main statistical ideas, sound interpretation of results, sensible and reasoned comparisons of classifiers, and demonstration of competence in the use of R as a tool for data analysis.
This part of the project should be written up as a coherent report, giving consideration to the points detailed in Section 1.1.1 below. You may like to include R code in your report. Alternatively, you can simply place the code in an Appendix and refer to it as appropriate. You do not need to comprehensively describe everything you have done to explore and model the data. However, you should provide a narrative which details and justifies the salient features of your approach, in addition to reporting and interpreting your results.
1.1.1 Points to consider
> ## Print 24th row of Breast Cancer data and note there is a NA in the > ## Bare.nuclei column:
> BreastCancer[24,]
Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size Bare.nuclei
24 1057013 8 4 5 1 2 <NA> Bl.cromatin Normal.nucleoli Mitoses Class
24 7 3 1 malignant
> ## Test whether each element on the 24th row is a NA:
> is.na(BreastCancer[24,])
Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size Bare.nuclei
24 FALSE FALSE FALSE FALSE FALSE FALSE TRUE
Bl.cromatin Normal.nucleoli Mitoses Class
24 FALSE FALSE FALSE FALSE
For the variants of logistic regression, you should present the coefficients of the fitted model, and any other useful graphical or numerical summaries. For LDA and QDA present estimates of the group means. In each case, discuss what your results show. For example, which variables drop out of the model when you use subset selection or the LASSO? What do the parameters tell you about the relationships between the response and predictor variables?
Why or why not?
Please download the answer file using this link
https://drive.google.com/file/d/1qbX-SaINHoRG9eQdOo_GI4l4cfsQCXr4/view?usp=sharing
NOTE: PLEASE ONLY USE IT AS SAMPLE BECAUSE THIS FILE HAS BEEN SUBMITTED BY OTHER STUDENTS AND WOULD COME UP AS PLAGIARISED FOR YOU.