Fill This Form To Receive Instant Help
Homework answers / question archive / Individual Assignment M3
In this assignment, you will try to replicate the Play Golf dataset on a familiar dataset - Boston Housing! Related to M3.1 but redone for classification. We will simplify the dataset, recode the target variable to 0/1, and only give a few candidate columns for you to us. You are only going to perform one split and then stop (this is enough to evaluate your proficiency).
To be completed ON YOUR OWN - any plagiarism (i.e. copying codes, comments or submitting work that is not your own) will be dealt with according to Graduate School policy.
Each student will enter their student ID and get a different set of rows to compute their calculations. Good luck!
Let's read in the Boston Housing data and subset a few columns to make things more intuitive.
# enter your Student ID here
studentID = 1234567 # update this based on your student ID!
import pandas as pd
import numpy as np
# read in the Boston Housing data
df = pd.read_csv('https://raw.githubusercontent.com/michelpf/mlnd-boston-housing/master/housing.csv')
df.info() # note that this version only has a few columns (RM, LSTAT, PTRATIO and MEDV)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 489 entries, 0 to 488
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 RM 489 non-null float64
1 LSTAT 489 non-null float64
2 PTRATIO 489 non-null float64
3 MEDV 489 non-null float64
dtypes: float64(4)
memory usage: 15.4 KB
So that you don't have to evalaute ALL possible combinations - let's recode RM, LSTAT and PTRATIO based on median values.
df['RM'] = np.where(df['RM'] > np.median(df['RM']), 1, 0)
df['LSTAT'] = np.where(df['LSTAT'] > np.median(df['LSTAT']), 1, 0)
df['PTRATIO'] = np.where(df['PTRATIO'] > np.median(df['PTRATIO']), 1, 0)
df['MEDV'] =np.where(df['MEDV'] > np.median(df['MEDV']), 1, 0) # we recoded the target variable!
This is where every student gets different datasets for modeling.
df = df.sample(n=15, random_state=studentID)
df
RM LSTAT PTRATIO MEDV
322 1 0 1 1
388 0 1 1 0
113 0 1 0 0
427 0 1 1 0
408 0 1 1 0
118 0 1 0 0
390 0 1 1 1
344 0 0 1 0
309 1 0 1 1
264 1 0 0 1
254 1 0 0 1
167 0 1 0 0
280 1 0 1 1
156 0 1 0 0
99 1 0 0 1
Now that you have your data, you can replicate and update the notebook from class. Your subheaders should look like this (with your numbers):
# don't forget the viz at the end to check your work - all of the numbers should match!