Homework answers / question archive / Part 1: Machine Learning Models You work for an office transport company

Part 1: Machine Learning Models You work for an office transport company

Statistics

Share With

Part 1: Machine Learning Models

You work for an office transport company. You are in discussions with ABC Consulting company for providing transport for their employees. For this purpose, you are tasked with understanding how do the employees of ABC Consulting prefer to commute presently (between home and office). Based on the parameters like age, salary, work experience etc. given in the data set ‘Transport.csv’, you are required to predict the preferred mode of transport. The project requires you to build several Machine Learning models and compare them so that the model can be finalised.

Data Dictionary Age : Age of the Employee in Years
Gender : Gender of the Employee
Engineer : For Engineer =1 , Non Engineer =0
MBA : For MBA =1 , Non MBA =0
Work Exp : Experience in years
Salary : Salary in Lakhs per Annum
Distance : Distance in Kms from Home to Office
license : If Employee has Driving Licence -1, If not, then 0
Transport : Mode of Transport

The objective is to build various Machine Learning models on this data set and based on the accuracy metrics decide which model is to be finalised for finally predicting the mode of transport chosen by the employee.

Questions:

Basic data summary, Univariate, Bivariate analysis, graphs, checking correlations, outliers and missing values treatment (if necessary) and check the basic descriptive statistics of the dataset.
Split the data into train and test in the ratio 70:30. Is scaling necessary or not?
Build the following models on the 70% training data and check the performance of these models on the Training as well as the 30% Test data using the various inferences from the Confusion Matrix and plotting a AUC-ROC curve along with the AUC values. Tune the models wherever required for optimum performance :

Logistic Regression Model
Linear Discriminant Analysis
Decision Tree Classifier – CART model
Naïve Bayes Model
KNN Model
Random Forest Model
Boosting Classifier Model using Gradient boost.

Which model performs the best?
What are your business insights?

Part 2: Text Mining

A dataset of Shark Tank episodes is made available. It contains 495 entrepreneurs making their pitch to the VC sharks.

You will ONLY use “Description” column for the initial text mining exercise.

Pick out the Deal (Dependent Variable) and Description columns into a separate data frame.
Create two corpora, one with those who secured a Deal, the other with those who did not secure a deal.
The following exercise is to be done for both the corpora:

Find the number of characters for both the corpuses.
Remove Stop Words from the corpora. (Words like ‘also’, ‘made’, ‘makes’, ‘like’, ‘this’, ‘even’ and ‘company’ are to be removed)
What were the top 3 most frequently occurring words in both corpuses (after removing stop words)?
Plot the Word Cloud for both the corpora.

Refer to both the word clouds. What do you infer?
Looking at the word clouds, is it true that the entrepreneurs who introduced devices are less likely to secure a deal based on your analysis?

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Please find the answer using this link

https://drive.google.com/file/d/1GIpEm9RBPgPvBOVxxCdNR5QIx9uMlnDb/view?usp=sharing

Google (5.0)

Part 1: Machine Learning Models You work for an office transport company

Statistics

Part 1: Machine Learning Models

Part 2: Text Mining

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Sitejabber (5.0)

BBC (5.0)

Trustpilot (4.9)

Related Questions

menu

Part 1: Machine Learning Models You work for an office transport company

Statistics

Part 1: Machine Learning Models

Part 2: Text Mining

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Sitejabber (5.0)

BBC (5.0)

Trustpilot (4.9)

Google (5.0)

Related Questions