Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / Individual Assignment M3

Individual Assignment M3

Computer Science

Individual Assignment M3.1

In this assignment, you will try to replicate the Play Golf dataset on a familiar dataset - Boston Housing! But we will simplify the dataset a bit and give four candidate columns for you to choose from. You are only going to go for one depth. To be completed ON YOUR OWN - any plagiarism (i.e. copying codes, comments or submitting work that is not your own) will be dealt with according to Graduate School policy.

 

Each student will enter their student ID and get a different set of rows to compute their calculations. Good luck!

 

Rubric:

100 pts: Student has a clearly labeled notebook with no errors. Your headers should match the class example, but the numbers and symbols should be updated to match your example. Calculations for reduction in global standard deviation mimic class example and are correct. Plot the tree at the end with a max_depth=1 to ensure you got the same answer.

80 pts: A minor error is carried throughout the notebook, lack of comments or headers, and/or no decision tree visualization to check the work.

50 pts: Major error, sloppy code and/or no decision tree visualization to check the work.

Data Prep

Let's read in the Boston Housing data and subset a few columns to make things more intuitive.

 

# enter your Student ID here

studentID = 1234567 # update this based on your student ID!

import pandas as pd

import numpy as np

# read in the Boston Housing data

df = pd.read_csv('https://raw.githubusercontent.com/michelpf/mlnd-boston-housing/master/housing.csv')

df.info() # note that this version only has a few columns (RM, LSTAT, PTRATIO and MEDV)

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 489 entries, 0 to 488

Data columns (total 4 columns):

 # Column Non-Null Count Dtype  

--- ------ -------------- -----  

 0 RM 489 non-null float64

 1 LSTAT 489 non-null float64

 2 PTRATIO 489 non-null float64

 3 MEDV 489 non-null float64

dtypes: float64(4)

memory usage: 15.4 KB

So that you don't have to evalaute ALL possible combinations - let's recode RM, LSTAT and PTRATIO based on median values.

 

df['RM'] = np.where(df['RM'] > np.median(df['RM']), 1, 0)

df['LSTAT'] = np.where(df['LSTAT'] > np.median(df['LSTAT']), 1, 0)

df['PTRATIO'] = np.where(df['PTRATIO'] > np.median(df['PTRATIO']), 1, 0)

df # we leave the target variable as is, we are doing regression!

RM LSTAT PTRATIO MEDV

0 1 0 0 504000.0

1 1 0 0 453600.0

2 1 0 0 728700.0

3 1 0 0 701400.0

4 1 0 0 760200.0

... ... ... ... ...

484 1 0 1 470400.0

485 0 0 1 432600.0

486 1 0 1 501900.0

487 1 0 1 462000.0

488 0 0 1 249900.0

489 rows × 4 columns

 

This is where every student gets different datasets for modeling.

 

df = df.sample(n=15, random_state=studentID)

df

RM LSTAT PTRATIO MEDV

99 1 0 0 697200.0

280 1 0 1 783300.0

167 0 1 0 401100.0

264 1 0 0 680400.0

388 0 1 1 105000.0

309 1 0 1 499800.0

254 1 0 0 651000.0

427 0 1 1 226800.0

390 0 1 1 585900.0

408 0 1 1 174300.0

118 0 1 0 428400.0

156 0 1 0 275100.0

113 0 1 0 392700.0

344 0 0 1 432600.0

322 1 0 1 466200.0

Good luck!

Now that you have your data, you can replicate and update the notebook from class. Your subheaders should look like this (with your numbers):

 

example of DTR from scratch.PNG

 

# don't forget the viz at the end to check your work - all of the numbers should match!

Option 1

Low Cost Option
Download this past answer in few clicks

18.99 USD

PURCHASE SOLUTION

Already member?


Option 2

Custom new solution created by our subject matter experts

GET A QUOTE

Related Questions