Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / Titanic Exercises Part 1 Titanic Exercises These exercises cover everything you have learned in this course so far

Titanic Exercises Part 1 Titanic Exercises These exercises cover everything you have learned in this course so far

Computer Science

Titanic Exercises Part 1

Titanic Exercises

These exercises cover everything you have learned in this course so far. You will use the background information to provided to train a number of different types of models on this dataset.

Background

The Titanic was a British ocean liner that struck an iceberg and sunk on its maiden voyage in 1912 from the United Kingdom to New York. More than 1,500 of the estimated 2,224 passengers and crew died in the accident, making this one of the largest maritime disasters ever outside of war. The ship carried a wide range of passengers of all ages and both genders, from luxury travelers in first-class to immigrants in the lower classes. However, not all passengers were equally likely to survive the accident. You will use real data about a selection of 891 passengers to predict which passengers survived.

Libraries and data

Use the titanic _train data frame from the titanic library as the starting point for this project.

library(titanic) # loads titanic_train data frame

library(caret)

library(tidyverse)

library(rpart)

# 3 sigmificant digits

options(digits = 3)

# clean the data - ‘titanic_train 1s loaded with the titanic package

titanic_clean <- titanic_train #>%

mutate(Survived - factor(Survived),

Embarked = factor(Embarked),

Age - ifelse(is.na(Age), median(Age, na.rm - TRUE), Age), # NA age to median age

FamilySize = SibSp + Parch + 1) #>% # count family members

select(Survived, Sex, Pclass, Age, Fare, SibSp, Parch, FamilySize, Embarked)

Question 1: Training and test sets

Split titanic clean into test and training sets - after running the setup code, it should have 891 rows and 9 variables.

Set the seed to 42, then use the caret package to create a 20% data partition based on the Survived column. Assign the 20% partition to test_set and the remaining 80% partition to train_set .

How many observations are in the training set?

How many observations are in the test set?

What proportion of individuals in the training set survived?

Question 2: Baseline prediction by guessing the outcome

The simplest prediction method is randomly guessing the outcome without using additional predictors. These methods will help us determine whether our machine learning algorithm performs better than chance. How accurate are two methods of guessing Titanic passenger survival?

Set the seed to 3. For each individual in the test set, randomly guess whether that person survived or not by sampling from the vector c(0,1) (Note: use the default argument setting of prob from the sample function).

What is the accuracy of this guessing method?

Question 3b: Predicting survival by sex

Predict survival using sex on the test set: if the survival rate for a sex is over 0.5, predict survival for all individuals of that sex, and predict death if the survival rate for a sex is under 0.5.

What is the accuracy of this sex-based prediction method on the test set?

Question 4a: Predicting survival by passenger class

In the training set, which class(es) ( Pclass ) were passengers more likely to survive than die?

Select ALL that apply.

  • 1
  • 2
  • 3

Question 4b: Predicting survival by passenger class

Predict survival using passenger class on the test set: predict survival if the survival rate for a class is over 0.5, otherwise predict death.

What is the accuracy of this class-based prediction method on the test set?

Question 4c: Predicting survival by passenger class

Use the training set to group passengers by both sex and passenger class.

Which sex and class combinations were more likely to survive than die (i.e. >50% survival)?

Select ALL that apply.

female 1st class

female 2nd class

female 3rd class

male 1st class

male 2nd class

male 3rd class

Question 4d: Predicting survival by passenger class

Predict survival using both sex and passenger class on the test set. Predict survival if the survival rate for a sex/class combination is over 0.5, otherwise predict death.

What is the accuracy of this sex- and class-based prediction method on the test set?

Question 5a: Confusion matrix

Use the confusionMatrix() function to create confusion matrices for the sex model, class model, and combined sex and class model. You will need to convert predictions and survival status to factors to use this function.

What is the "positive" class used to calculate confusion matrix metrics?

  • 0
  • 1

Which model has the highest sensitivity?

  • sex only
  • class only
  • sex and class combined

Which model has the highest specificity?

  • sex only
  • class only
  • sex and class combined

Which model has the highest balanced accuracy?

  • sex only
  • class only
  • sex and class combined

Question 5b: Confusion matrix

What is the maximum value of balanced accuracy from Q5a?

Question 6: F1 scores

Use the P_meas() function to calculate F, scores for the sex model, class model, and combined sex and class model. You will need to convert predictions to factors to use this function.

Which model has the highest F1 score?

  • sex only
  • class only
  • sex and class combined

What is the maximum value of the F1 score?

Titanic Exercises, part 2

Question 7: Survival by fare - LDA and QDA

Sat the seed to 1. Train a model using linear discriminant analysis (LDA) with the caret 1da method using fare as the only predictor.

What is the accuracy on the test set for the LDA model?

Set the seed to 1. Train a model using quadratic discriminant analysis (QDA) with the caret qda method using fare as the only predictor.

What is the accuracy on the test set for the QDA model?

Note: when training models for Titanic Exercises Part 2, please use the S3 method for class formula rather than the default $3 method of caret train() (See ?caret::train for details).

Question 8: Logistic regression models

Set the seed to 1. Train a logistic regression model with the caret g1m method using age as the only predictor.

What is the accuracy of your model (using age as the only predictor) on the test set?

Set the seed to 1. Train a logistic regression model with the caret g1m method using four predictors: sex, class, fare, and age.

What is the accuracy of your model (using these four predictors) on the test set?

Set the seed to 1. Train a logistic regression model with the caret g1m method using all predictors. Ignore warnings about rank-deficient fit.

What is the accuracy of your model (using all predictors) on the test set?

Question 9a: KNN model

Set the seed to 6. Train a KNN model on the training set using the caret train function. Try tuning with

k = seq(3, 51, 2).

What is the optimal value of the number of neighbors k?

Question 9b: KNN model

Plot the KNN model to investigate the relationship between the number of neighbors and accuracy on the training set. Of these values of &, which yields the highest accuracy?

  • 7
  • 11
  • 17
  • 21
  • 23

Question 9c: KNN model

What is the accuracy of the KNN model on the test set?

Question 10: Cross-validation

Set the seed to 8 and train a new KNN model. Instead of the default training control, use 10-fold cross-validation where each Partition consists of 10% of the total. Try tuning with k = seq(3, 51, 2) .

What is the optimal value of k using cross-validation?

What is the accuracy on the test set using the cross-validated kNN model?

Question 11a: Classification tree model

Set the seed to 10. Use caret to train a decision tree with the rpart method. Tune the complexity parameter with cp = seq(0, 0.05, 0.002) .

What is the optimal value of the complexity parameter ( cp )?

What is the accuracy of the decision tree model on the test set?

Question 11b: Classification tree model

Inspect the final model and plot the decision tree.

Which variables are used in the decision tree?

Select ALL that apply.

Survived

Sex

Pclass

Age

Fare

Parch

Embarked

Question 11c: Classification tree model

Using the decision rules generated by the final model, predict whether the following individuals would survive.

A 28-year-old male

A female in the second passenger class

A third-class female who paid a fare of $8

A 5-year-old male with 4 siblings

A third-class female who paid a fare of $25

A first-class 17-year-old female with 2 siblings

A first-class 17-year-old male with 2 siblings

Question 12: Random forest model

Set the seed to 14. Use the caret train() function with the rf method to train a random forest. Test values of mtry = seq(1:7). Set ntree to 100.

What mtry value maximizes accuracy?

What is the accuracy of the random forest model on the test set?

Use variImp() on the random forest model object to determine the importance of various predictors to the random forest model.

What is the most important variable?

Be sure to report the variable name exactly as it appears in the code.

 

Option 1

Low Cost Option
Download this past answer in few clicks

26.99 USD

PURCHASE SOLUTION

Already member?


Option 2

Custom new solution created by our subject matter experts

GET A QUOTE

Related Questions