Homework answers / question archive / CS 418: Assignment 3 k Nearest Neighbors In this assignment, you will implement a K-Nearest Neighbor classifier from scratch and use it on three datasets: iris dataset, banknotes dataset, and MNIST dataset

CS 418: Assignment 3 k Nearest Neighbors In this assignment, you will implement a K-Nearest Neighbor classifier from scratch and use it on three datasets: iris dataset, banknotes dataset, and MNIST dataset

Computer Science

Share With

CS 418: Assignment 3

k Nearest Neighbors

In this assignment, you will implement a K-Nearest Neighbor classifier from scratch and use it on three datasets: iris dataset, banknotes dataset, and MNIST dataset. You are not allowed to use sklearn KNN learner.

Iris Dataset (45 pts)

First download the iris dataset from here . This is Iris flower species dataset and predict the flower species based on flower measurements.

There are 150 observations with 4 attributes and a class label. Here’s the description of data file:

First column:	sepal length in cm
Second column:	sepal width in cm
Third column:	petal length in cm
Fourth column:	petal width in cm
Fourth column:	Class label

Before using the data you need to perform two simple tasks:

Encode the class labels

Split the dataset into train and test sets. Keep 20% of data for testing and the rest will be training data. Keep that in mind that in order to get good results, you need to make sure labels are distributed evenly in train and test data (stratified sampling). You can use methods provided bysklearn package, such as train-test-split from sklearncross-validation

Proceed to implement the k-nearest neighbors algorithm. Recall that this simple algorithm only requires the following steps:

Step1: Calculate the distance from test data ( Euclidean distance)

Step2: Find the set I of k observations with smallest distances Step3: Assign a label by taking a majority vote on I

Compare all four features distribution in each iris class using boxplots.
Start with k = 1, plot the decision boundary using the first two features (Sepal length and width)
Perform the prediction using k = 2,4,6,10 and plot the decision boundaries. How does the decision boundary change by increasing the number of neighbors?
For all cases, report accuracy and confusion matrix.

Bank notes Dataset (35 pts)

The Banknote Dataset contains several measures taken from a photograph of genuine and forged bank notes. You can download the data from here .

There are two class labels indicating whether a given note is forged or not. There are 1,372 observations with 4 attributes and a class label. The description of data is given below:

First column:	Variance of Wavelet Transformed image
Second column:	Skewness of Wavelet Transformed image
Third column:	Kurtosis of Wavelet Transformed image
Fourth column:	Entropy of image
Fourth column:	Class label

For this dataset:

Perform a 2-nearest neighbor on bank note dataset using 80% of the data as training data and the rest as test. Report the accuracy and confusion matrix.
Change the majority based voting with a method of your choosing. How does it affect the error rate?
Use two new distance measures: Manhattan distance and L₃(Minkowski formula for p = 3), and redo the previous step. How does changing the distance function affect the classification?
For all cases, report accuracy and confusion matrix.

MNIST Dataset (20 pts)

MNIST consists of handwritten digit images of all numbers from zero to nine. In MNIST, each image contains a single grayscale digit drawn by hand. And each image is a 784 dimensional vector (28 pixels for both height and width) of floating-point numbers where each value represents a pixel’s brightness. The training set has 60000 examples and the test set has 10000 examples:

Download the csv format of training data and test data .

Perform the 2-nearest neighbors on MNIST dataset using 500, 1000, 2500, 5000, and 10000 training examples. How does the classification error change with number of training example? plot it.

( You can use 1000 test examples)

Report confusion matrix of the best model?

Rules

This is an individual assignment. It is not a group activity.
Please include some proper explanations for your results. Do not submit a notebook with code cells only. You need to properly describe your methods and discuss/analyze your observations.
We will look at the quality of your work for grading. You submission should be coherent and well documented.
There are many online discussions and demos on both of these datasets. It is okay if you look them up, but you must write your own code and analyze the data by yourself.
We will run your code though MOSS software to detect copying and plagiarism.

Submission

Submit everything through Gradescope and Blackboard. You will need to upload:

The Jupyter notebook all your work is in (.ipynb file) on Blackboard
PDF version of your Jupyter notebook on Gradescope

In order to make grading easier for your TA, please use the following format for naming your files:

netid-hw3-418 { .pynb, .pdf }

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Project31.pdf

Answer Preview

Please download the answer file using this link

https://drive.google.com/file/d/1812PAwrXkc4x5cKD2mDGwrxcSC6Xb0UY/view?usp=sharing

CS 418: Assignment 3 k Nearest Neighbors In this assignment, you will implement a K-Nearest Neighbor classifier from scratch and use it on three datasets: iris dataset, banknotes dataset, and MNIST dataset

Computer Science

CS 418: Assignment 3

k Nearest Neighbors

Submission

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Sitejabber (5.0)

BBC (5.0)

Trustpilot (4.9)

Google (5.0)

Related Questions

menu

CS 418: Assignment 3 k Nearest Neighbors In this assignment, you will implement a K-Nearest Neighbor classifier from scratch and use it on three datasets: iris dataset, banknotes dataset, and MNIST dataset

Computer Science

CS 418: Assignment 3

k Nearest Neighbors

Submission

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Sitejabber (5.0)

BBC (5.0)

Trustpilot (4.9)

Google (5.0)

Related Questions