Fill This Form To Receive Instant Help
Homework answers / question archive / Part 2 questions : Model building and interpretation
Part 2 questions :
a. Build various models (You can choose to build models for either or all of descriptive, predictive or prescriptive purposes)
b. Test your predictive model against the test set using various appropriate performance metrics
c.Interpretation of the model(s) - 10marks
a.Ensemble modelling, wherever applicable
b. Any other model tuning measures(if applicable)
c. Interpretation of the most optimum model and its implication on the business – 10 marks
Standard Instructions for Business Report:
PART 1
FEED BACK kindly go through this and implement at part 2
a) Defining problem statement b) Need of the study/project c) Understanding business/social opportunity
FB : Need to elaborate more on significance of Demand planning and supply Chain Management & optimisation techniques with facts & figures. The significance of Demand planning and supply Chain Management & optimisation techniques should be explained with facts & figures from industry source.
a) Univariate analysis (distribution and spread for every continuous attribute, distribution of data in categories for categorical ones) b) Bivariate analysis (relationship between different variables , correlations) a) Removal of unwanted variables (if applicable) b) Missing Value treatment (if applicable) d) Outlier treatment (if required) e) Variable transformation (if applicable) f) Addition of new variables (if required)
FB : The purpose of Univariate Analysis is to find out which variables have clear separation for the target variable – separation of mean & median of continuation variables and their skewness affecting the target variable. Bivariate analysis is to establish the relationship among various independent variables and with dependent variables. Specific statistical/business insights are missing. Variables having linear relationship in Pairplots should have identified and mentioned. For correlation heat map, which are the variables having multicollinearity should have identified. Heatmap & Pairplot not drawn properly. Wh_est_year should be converted to age of the warehouse & age plays a significant role in warehouse related issue. Missing value for certificates should be treated as applied but not received from govt. Should have done significant test of wh_est_yeat. If it happens to be significant, then could have done KNN imputation. wh_govt_certification is a categorical, so should have done mode imputation. Should have carried out significant test for categorical variables with target variable (ANOVA). Flood proof, flood impacted & electricity supply are categorical variables. So should not do distribution plots, outlier & correlation. The variable transformation – encoding of categorical variables should have been done & documented with methodology for specifying each variable.
a) Is the data unbalanced? If so, what can be done? Please explain in the context of the business b) Any business insights using clustering (if applicable) c) Any other business insights
FB: Checking data unbalance is very critical to classification problem, but not for any linear regression. But any unbalance of data (less data points) specific to any variable/segment could have checked & mentioned here, eg. Wh_east_year has more than 40% missing data. Data unbalancing is most common and critical for classification problem. But it does not mean that it's not applicable to regression problem. It does apply, but not that critical. It's applicable to regression problem where observations are less in a particular segment, class etc. for which one cannot do regression. Also, for any segment where missing values are high or with more extreme values either end. Imbalance concept in regression is there, but rarely used. It’s rarely used at Machine Learning stage, but it’s critical for Deep Learning. Like we apply Smote for classification problem to address imbalance issue, Smoter is used for regression problem. But here we don't have to apply this or need to take any action. But conceptually we have to mention these points. Am sharing few blogs for reference. https://analyticsindiamag.com/deep-imbalanced-regression-complete-guide/ https://www.kaggle.com/questions-and-answers/34328 Should have done clustering of warehouses based of demand. The business report should have List of Contents (index) with page numbering and list of tables & figures with numbering.
Please download the answer file using this link
https://drive.google.com/file/d/102XBBdEEXHwbgpS6dOUpT-AqNUHHZ46R/view?usp=share_link