Fill This Form To Receive Instant Help
Homework answers / question archive / CS 418: Assignment 3 k Nearest Neighbors In this assignment, you will implement a K-Nearest Neighbor classifier from scratch and use it on three datasets: iris dataset, banknotes dataset, and MNIST dataset
In this assignment, you will implement a K-Nearest Neighbor classifier from scratch and use it on three datasets: iris dataset, banknotes dataset, and MNIST dataset. You are not allowed to use sklearn KNN learner.
First download the iris dataset from here. This is Iris flower species dataset and predict the flower species based on flower measurements.
There are 150 observations with 4 attributes and a class label. Here’s the description of data file:
First column: |
sepal length in cm |
Second column: |
sepal width in cm |
Third column: |
petal length in cm |
Fourth column: |
petal width in cm |
Fourth column: |
Class label |
Before using the data you need to perform two simple tasks:
Encode the class labels
Split the dataset into train and test sets. Keep 20% of data for testing and the rest will be training data. Keep that in mind that in order to get good results, you need to make sure labels are distributed evenly in train and test data (stratified sampling). You can use methods provided bysklearn package, such as train-test-split from sklearncross-validation
Proceed to implement the k-nearest neighbors algorithm. Recall that this simple algorithm only requires the following steps:
Step1: Calculate the distance from test data ( Euclidean distance)
Step2: Find the set I of k observations with smallest distances Step3: Assign a label by taking a majority vote on I
The Banknote Dataset contains several measures taken from a photograph of genuine and forged bank notes. You can download the data from here.
There are two class labels indicating whether a given note is forged or not. There are 1,372 observations with 4 attributes and a class label. The description of data is given below:
First column: |
Variance of Wavelet Transformed image |
Second column: |
Skewness of Wavelet Transformed image |
Third column: |
Kurtosis of Wavelet Transformed image |
Fourth column: |
Entropy of image |
Fourth column: |
Class label |
For this dataset:
MNIST consists of handwritten digit images of all numbers from zero to nine. In MNIST, each image contains a single grayscale digit drawn by hand. And each image is a 784 dimensional vector (28 pixels for both height and width) of floating-point numbers where each value represents a pixel’s brightness. The training set has 60000 examples and the test set has 10000 examples:
Download the csv format of training data and test data.
( You can use 1000 test examples)
Rules
Submit everything through Gradescope and Blackboard. You will need to upload:
In order to make grading easier for your TA, please use the following format for naming your files:
netid-hw3-418 { .pynb, .pdf }
Already member? Sign In