Build machine learning models and prediction of parameters u

Homework answers / question archive / Build machine learning models and prediction of parameters using kaggle Titanic dataset

Build machine learning models and prediction of parameters using kaggle Titanic dataset

Computer Science

Share With

Build machine learning models and prediction of parameters using kaggle Titanic dataset.

• For questions that include programming, Python language is to be used. The completed code must be sub-mitted as a Python Notebook (.ipynb). The Notebook must include detailed comments for each step.
• The Notebook must be accompanied with a detailed writeup (to be submitted as a Portable Document File), including what was done, the results and inferences drawn. • The datasets must be preprocessed and cleaned before building models. The code for preprocessing the data is provided in the Python Notebook, which can be used in your program after downloading the datasets. • Including details about the errors/challenges encountered while doing the program will fetch additional credit.
• A couple of references on ensemble learning has been shared. Please feel free to refer to any other materials available online to answer the question. The intention is to just introduce the concept of ensembling and its different techniques with the flexibility offered to select the base learners.
1. The Royal Mail Steamer (RMS) Titanic sank on 15 April 1912. The shipwreck resulted in 1502 deaths from among the 2224 passengers and crew. On analysis, it was observed that certain groups of people had higher chances of survival than others. The Titanic Dataset released by Kaggle as part of a competition. The task is to build a predictive model to identify people who are more likely to survive. The dataset has been split into two parts: train.csv. and test.csv. The training set which includes the ground truth as well can be used to build the model. You will be using the instances present in the file train.csv only. This is to be further split into sets for training and testing.
i. [8 points] Build a Logistic Regression model (with default parameters and maximum number of itera-tions set to 1000) to predict the chances of survival for each individual. Split the data in train.csv into training and testing sets in the ratio 80 : 20, with random seed value 78 . ii. [5 points] Calculate the testing accuracy value. Plot the ROC graph and calculate the AUC score. iii. [5 points] Apply 10-fold cross validation (with random seed value 78) to build a Logistic Regression model (with default parameters and maximum number of iterations set to 1000). iv. [2 points] Calculate the mean of test accuracy scores. Compare it with the accuracy value obtained without using Cross Validation. v. [10 points] Perform hyperparameter tuning using grid search technique to find the best set of hyper-parameters for the Logistic Regression Model. The grid of parameters is given in Table 1.Provide the parameters of the best model obtained. Also calculate the test accuracy using the best model. vi. [10 points] Perform hyperparameter tuning using randomized search technique to find the best set of hyperparameters for the Logistic Regression Model. The grid of parameters is given in Table 2. Provide the parameters of the best model obtained. Also calculate the test accuracy using the best model.
Parameter Value penalty 12 maxiter 0.0001, 0.0002, 0.0003 1000, 2000, 3000 Ctol intercept—scaling 0.01, 0.1, 1, 1, 2, 3, 4 10, 100 solver liblinear, lbfgs
Table 1: Parameter settings for hyperparameter tuning using Grid Search
Parameter
Value
penalty Ctol maxiter intercept solver
12,11 0.0001, 0.0002, 0.0003 1000, 2000, 3000 uniform distribution with location = 0 and scale = 2 1, 2, 3, 4 liblinear
Table 2: Parameter settings for hyperparameter tuning using Randomized Search
vii. [3 points] Compute the following values for the models built in Ii, lv and lvi: • Precision and Recall • Fl-score • Macro, micro and weighted averages of precision, recall and Fl-score viii. [2 points] Comment on the models built with and without hyperparameter tuning based on the metric values.
2. [5 points] Based on the reading, explain ensemble learning in your own words. 3. [10 points] Briefly describe any one usecase (related to your organization/institute) that can ham benefits of machine learning.