You have to submit 2 files: Answer Report: In this, yo

Homework answers / question archive / You have to submit 2 files: Answer Report: In this, you need to submit all the answers to all the questions in a sequential manner

You have to submit 2 files: Answer Report: In this, you need to submit all the answers to all the questions in a sequential manner

Statistics

Share With

You have to submit 2 files:

Answer Report: In this, you need to submit all the answers to all the questions in a sequential manner. It should include a detailed explanation of the approach used, insights, inferences, all outputs of codes like graphs, tables, etc. Your report should not be filled with codes. You will be evaluated based on the business report.
Jupyter Notebook file: This is a must and will be used for reference while evaluating.

Problem 1: Clustering

The dataset given is about the Health and economic conditions in different States of a

country. The Group States based on how similar their situation is, so as to provide these groups to the government so that appropriate measures can be taken to escalate their Health and Economic conditions.

Questions:

1. Read the data and do exploratory data analysis. Describe the data briefly. (Check thenull values, Data types, shape, EDA, etc, etc)
2. Do you think scaling is necessary for clustering in this case? Justify
3. Apply hierarchical clustering to scaled data. Identify the number of optimum clustersusing Dendrogram and briefly describe them.
4. Apply K-Means clustering on scaled data and determine optimum clusters. Apply elbowcurve and find the silhouette score.
5. Describe cluster profiles for the clusters defined. Recommend different priority basedactions that need to be taken for different clusters on the bases of their vulnerability situations according to their Economic and Health Conditions.

Data Dictionary for State_wise_Health_income:

1. 1. States: names of States
  2. Health_indeces1: A composite index rolls several related measures (indicators) into a single score that provides a summary of how the health system is performing in the State.
  3. Health_indeces2: A composite index rolls several related measures (indicators) into a single score that provides a summary of how the health system is performing in certain areas of the States.
  4. Per_capita_income-Per capita income (PCI) measures the average income earned per

person in a given area (city, region, country, etc.) in a specified year. It is calculated by dividing the area's total income by its total population.

1. 1. GDP: GDP provides an economic snapshot of a country/state, used to estimate the size of an economy and growth rate.

Dataset for Problem 1: State_wise_Health_income.csv

Problem 2: CART-RF-ANN

Mortality Outcomes for Females Suffering Myocardial Infarction

The mifem data frame has 1295 rows and 10 columns. This is a Dataset of females having coronary heart disease (CHD). you have to predict with the given information whether the female is dead or alive so as to discover important factors that should be considered crucial in the treatment of the disease. Use CART, RF & ANN, and compare the models' performances in train and test sets.

1. Data Ingestion: Read the dataset. Do the descriptive statistics and do null valuecondition check, write an inference on it.
2. Encode the data (having string values) for Modelling. Data Split: Split the data into testand train, build classification model CART, Random Forest, Artificial Neural Network.
3. Performance Metrics: Check the performance of Predictions on Train and Test sets usingAccuracy, Confusion Matrix, Plot ROC curve, and get ROC_AUC score for each model. 2.4 Final Model: Compare all the models and write an inference which model is best/optimized.

2.5 Inference: Basis on these predictions, what are the insights and recommendations?

Dataset for Problem 2: mifem.csv

Data Dictionary for mifem.csv :

Outcome: mortality outcome: a factor with levels live, dead
Age: age at onset
Yronset: year of onset (The year of onset is the year on which an individual acquires, develops, or first experiences a condition or symptoms of a disease or disorder)
Premi: previous myocardial infarction event, a factor with levels y, n, nk not known
Smstat: smoking status, a factor with levels c current, x ex-smoker, n non-smoker, nk not

known

Diabetes: a factor with levels y, n, nk not known
Highbp: high blood pressure, a factor with levels y, n, nk not known
Hichol: high cholesterol, a factor with levels y, n for yes and no
Angina: a factor with levels y, n, nk not known
Stroke: a factor with levels y, n, nk not known

Criteria	Ratings	Pts
This criterion is linked to a Learning Outcome1.1. Read the data and do exploratory data analysis. Describe the data briefly. (Check the null values, Data types, shape, EDA, etc, etc)	This area will be used by the assessor to leave comments related to this criterion.	5.0 pts
This criterion is linked to a Learning Outcome1.2. Do you think scaling is necessary for clustering in this case? Justify	This area will be used by the assessor to leave comments related to this criterion.	5.0 pts
This criterion is linked to a Learning Outcome1.3. Apply hierarchical clustering to scaled data. Identify the number of optimum clusters using Dendrogram and briefly describe them.	This area will be used by the assessor to leave comments related to this criterion.	7.5 pts
This criterion is linked to a Learning Outcome1.4. Apply K-Means clustering on scaled data and determine optimum clusters. Apply elbow curve and find the silhouette score.	This area will be used by the assessor to leave comments related to this criterion.	7.5 pts
This criterion is linked to a Learning Outcome1.5. Describe cluster profiles for the clusters defined. Recommend different priority based actions that need to be taken for different clusters on the bases of their vulnerability situations according to their Economic and Health Conditions.	This area will be used by the assessor to leave comments related to this criterion.	5.0 pts
This criterion is linked to a Learning Outcome2.1. Data Ingestion: Read the dataset. Do the descriptive statistics and do null value condition check, write an inference on it.	This area will be used by the assessor to leave comments related to this criterion.	5.0 pts
This criterion is linked to a Learning Outcome2.2. Encode the data (having string values) for Modelling. Data Split: Split the data into test and train, build classification model CART, Random Forest, Artificial Neural Network.	This area will be used by the assessor to leave comments related to this criterion.	7.5 pts
This criterion is linked to a Learning Outcome2.3 Performance Metrics: Check the performance of Predictions on Train and Test sets using Accuracy, Confusion Matrix, Plot ROC curve, and get ROC_AUC score for each model.	This area will be used by the assessor to leave comments related to this criterion.	7.5 pts
This criterion is linked to a Learning Outcome2.4 Final Model: Compare all the models and write an inference which model is best/optimized.	This area will be used by the assessor to leave comments related to this criterion.	5.0 pts

This criterion is linked to a Learning Outcome2.5 Inference: Basis on these predictions, what are the insights and recommendations?	This area will be used by the assessor to leave comments related to this criterion.	5.0 pts
Total Points: 60.0