Fill This Form To Receive Instant Help

You have to submit 2 files: Answer Report: In this, you need to submit all the answers to all the questions in a sequential manner

Statistics

You have to submit 2 files:

1. Answer Report: In this, you need to submit all the answers to all the questions in a sequential manner. It should include a detailed explanation of the approach used, insights, inferences, all outputs of codes like graphs, tables, etc. Your report should not be filled with codes. You will be evaluated based on the business report.
2. Jupyter Notebook file: This is a must and will be used for reference while evaluating.

Problem 1: Clustering

The dataset given is about the Health and economic conditions in different States of a

country. The Group States based on how similar their situation is, so as to provide these groups to the government so that appropriate measures can be taken to escalate their Health and Economic conditions.

Questions:

1. Read the data and do exploratory data analysis. Describe the data briefly. (Check thenull values, Data types, shape, EDA, etc, etc)
2. Do you think scaling is necessary for clustering in this case? Justify
3. Apply hierarchical clustering to scaled data. Identify the number of optimum clustersusing Dendrogram and briefly describe them.
4. Apply K-Means clustering on scaled data and determine optimum clusters. Apply elbowcurve and find the silhouette score.
5. Describe cluster profiles for the clusters defined. Recommend different priority basedactions that need to be taken for different clusters on the bases of their vulnerability situations according to their Economic and Health Conditions.

Data Dictionary for State_wise_Health_income:

1. States: names of States
2. Health_indeces1: A composite index rolls several related measures (indicators) into a single score that provides a summary of how the health system is performing in the State.
3. Health_indeces2: A composite index rolls several related measures (indicators) into a single score that provides a summary of how the health system is performing in certain areas of the States.
4. Per_capita_income-Per capita income (PCI) measures the average income earned per

person in a given area (city, region, country, etc.) in a specified year. It is calculated by dividing the area's total income by its total population.

1. GDP: GDP provides an economic snapshot of a country/state, used to estimate the size of an economy and growth rate.

Dataset for Problem 1: State_wise_Health_income.csv

Problem 2: CART-RF-ANN

Mortality Outcomes for Females Suffering Myocardial Infarction

The mifem data frame has 1295 rows and 10 columns. This is a Dataset of females having coronary heart disease (CHD). you have to predict with the given information whether the female is dead or alive so as to discover important factors that should be considered crucial in the treatment of the disease. Use CART, RF & ANN, and compare the models' performances in train and test sets.

1. Data Ingestion: Read the dataset. Do the descriptive statistics and do null valuecondition check, write an inference on it.
2. Encode the data (having string values) for Modelling. Data Split: Split the data into testand train, build classification model CART, Random Forest, Artificial Neural Network.
3. Performance Metrics: Check the performance of Predictions on Train and Test sets usingAccuracy, Confusion Matrix, Plot ROC curve, and get ROC_AUC score for each model. 2.4 Final Model: Compare all the models and write an inference which model is best/optimized.

2.5 Inference: Basis on these predictions, what are the insights and recommendations?

Dataset for Problem 2: mifem.csv

Data Dictionary for mifem.csv :

1. Outcome: mortality outcome: a factor with levels live, dead
2. Age: age at onset
3. Yronset: year of onset (The year of onset is the year on which an individual acquires, develops, or first experiences a condition or symptoms of a disease or disorder)
4. Premi: previous myocardial infarction event, a factor with levels y, n, nk not known
5. Smstat: smoking status, a factor with levels c current, x ex-smoker, n non-smoker, nk not

known

1. Diabetes: a factor with levels y, n, nk not known
2. Highbp: high blood pressure, a factor with levels y, n, nk not known
3. Hichol: high cholesterol, a factor with levels y, n for yes and no
4. Angina: a factor with levels y, n, nk not known
5. Stroke: a factor with levels y, n, nk not known

 This criterion is linked to a Learning Outcome2.5 Inference: Basis on these predictions, what are the insights and recommendations? This area will be used by the assessor to leave comments related to this criterion. 5.0 pts Total Points: 60.0

32.99

Option 2

rated 5 stars

Purchased 3 times

Completion Status 100%