Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / Project 3: Softmax Regression and NN In this project, we will explore softmax regression and neural networks

Project 3: Softmax Regression and NN In this project, we will explore softmax regression and neural networks

Computer Science

Project 3: Softmax Regression and NN

In this project, we will explore softmax regression and neural networks. The project files are available

for download in the p3 folder on the course Files page.

Files to turn in: softmax.py Implementation of softmax regression

 nn.py Implementation of fully connected neural network

 

partners.txt Lists the full names of all members in your team.

 

writeup.pdf Answers all the written questions in this assignment

 

Files you created for Extra-Credits

You will be using helper functions in the following .py files and datasets: utils.py Helper functions for Softmax Regression and NN data/* Training and dev data for SR and NN

Please do not change the file names listed above. Submit partners.txt even if you do the project

alone. Only one member in a team (even if it is a cross-section team) is supposed to submit the project.

AutogradingPlease do not change the names of any provided functions or classes within the code, or you will wreak havoc on the autograder. However, the correctness of your implementation -- not the autograder's output -- will be the final judge of your score. If necessary, we will review and grade

assignments individually to ensure that you receive due credit for your work. Please do not import

(potentially unsafe) system-related modules such as sys, exec, eval... Otherwise the autograder will

assign a zero point without grading.

Part I    Softmax Regression [45%]

Files to edit/turn in for this part

softmax.py
The goal of this part of the project is to implement Softmax Regression in order to classify the MNIST

digit dataset. Softmax Regression is essentially a two-layer neural network where the output layer

applies the Softmax cost function, a multiclass generalization of the logistic cost function.

In logistic regression, we have a hypothesis function of the form

where   is our weight vector. Like the hyperbolic tangent function, the logistic function is also a

sigmoid function with the characteristic 's'-like shape, though it has a range of (0, 1) instead of (-1, 1).

Note that this is

 

technically not a classifier since it returns probabilities instead of a predicted class, but it's easy to turn

it into a classifier by simply choosing the class with the highest probability.

Since logistic regression is used for binary classification, it is easy to see that:

 

Similarly,

 

From this form it appears that we can assign the vector  as the weight vector for class 1 and

 as the weight vector for class 0. Our probability formulas are now unified into one equation:

This immediately motivates generalization to classification with more than 2 classes. By assigning a

separate weight vector  to each class, for each example  we can predict the probability that it is
class , and again we can classify by choosing the most probable class. A more compact way of

representing the values  is  where each row   of W is . We can also represent a dataset

 with a matrix  where each column is

 

a single example.

Qsr1 (10%)

 

(1) Show that the probabilities sum to 1.

 

(2) What are the dimensions of ? ? ?

We can also train on this model with an appropriate loss function. The Softmax loss function is given

by

where  is the number of examples,  is the number of classes, and is an indicator

variable that equals 1 when the statement inside the brackets is true, and 0 otherwise. The gradient

(which you will not derive) is given by:

Note that the indicator and the probabilities can be represented as matrices, which makes the code

for the loss and the gradient very simple. (See here

(http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/) for more details)

softmax.py contains a mostly-complete implementation of Softmax Regression. A code stub also has

been provided in run_softmax.py. Once you correctly implement the incomplete portions of

softmax.py, you will be able to run run_softmax.py in order to classify the MNIST digits.

Qsr2 (15%)

(1) Complete the implementation of the cost function.

 

(2) Complete the implementation of the predict function.

Check your implementation by running:

>>> python run_softmax.py

The output should be:

RUNNING THE L-BFGS-B CODE

 

* * *

 

Machine precision = 2.220D-16

 

N = 7840 M = 10

 

At X0 0 variables are exactly at the bounds
At iterate 0 f= 2.30259D+00 |proj g|= 6.37317D-02

 

At iterate 1 f= 1.52910D+00 |proj g|= 6.91122D-02

 

At iterate 2 f= 7.72038D-01 |proj g|= 4.43378D-02

 

...

 

At iterate 401 f= 2.19686D-01 |proj g|= 2.52336D-04

 

At iterate 402 f= 2.19665D-01 |proj g|= 2.04576D-04

 

* * *

 

Tit = total number of iterations

 

Tnf = total number of function evaluations

 

Tnint = total number of segments explored during Cauchy searches

 

Skip = number of BFGS updates skipped

 

Nact = number of active bounds at final generalized Cauchy point

 

Projg = norm of the final projected gradient

 

F = final function value

 

* * *

 

N Tit Tnf Tnint Skip Nact Projg F

 

7840 402 431 1 0 0 2.046D-04 2.197D-01

 

F = 0.21966482316858085

 

STOP: TOTAL NO. of ITERATIONS EXCEEDS LIMIT

 

Cauchy time 0.000E+00 seconds.

 

Subspace minimization time 0.000E+00 seconds.

 

Line search time 0.000E+00 seconds.

 

Total User time 0.000E+00 seconds.

 

Accuracy: 93.99%

Qsr3 (10%)

In the cost function, we see the line

W_X = W_X - np.max(W_X)

This means that each entry is reduced by the largest entry in the matrix.

 

(1) Show that this does not affect the predicted probabilities.

 

(2) Why might this be an optimization over using W_X? Justify your answer.

Qsr4 (10%)

Use the learningCurve function in runClassifier.py to plot the accuracy of the classifier as a function of

the number of examples seen. Include the plot in your write-up. Do you observe any overfitting or

underfitting? Discuss and expain what you observe
In this part of the project, we'll implement a fully-connected neural network in general for the MNIST

dataset. For Qnn1_ you will complete nn.py. For Qnn2, create your own files.

A code stub also has been provided in run_nn.py. Once you correctly implement the incomplete

portions of nn.py, you will be able to run run_nn.py in order to classify the MNIST digits.

The dataset is included under ./data/. You will be using the following helper functions in “utils.py". In

the following description, let Ak, N, d denote the number of classes, number of samples and number

of features.

e Selected functions in “utils.py"

loadMNIST(image file, label file) #returns data matrix X with shape (d, N) and labels with shape (N,)

onehot(labels) # encodes labels into one hot style with shape (K,N)

acc(pred label, Y) # calculate the accuracy of prediction given ground truth Y, where pred label is with s

hape (N,), Y is with shape (N,K).

data _loader(X, Y=None, batch_size=64, shuffle=False) # Iterator that yield X,Y with shape(d, batch size) a

nd (K, batch_size).

Qnn1 (20% for Qnn1.1, 1.2, 1.3 and 5% for Qnn 1.4) Implement the NN

The scaffold has been built for you. Initialize the model and print the architecture with the following:

>>> from nn import NN, Relu, Linear, SquaredLoss

>>> from utils import data loader, acc, save plot, loadMNIST, onehot

>>> model = NN(Relu(), SquaredLoss(), hidden_layers=[128,128])

>>> model. print_model()

Two activation functions (Relu, Linear) and self.predict(X) have been implemented for you.

Qnn7.7 Implement squared loss cost functions (TODO 0 & TODO 1)

Assume Y is the output of the last layer before loss calculation (without activation), which is a K-by-N

matrix. Y is the one hot encoded ground truth of the same shape. Implement the following loss

function and its gradient (You need to calculate and implement the gradient of the loss function

yourself) (Notice that the loss functions are normalized by batch_size JV):

7\ — 1 N ly 2 liv 9

L(Y,Y) =<, sil¥i - Vill? = sv ll¥ - Y||7,9 where Y; is the z-th column of Y.

Typically we would use cross entropy loss, the formula of which is provided for your reference (but

you are only required to implement for squared loss):

Lon(Y,Y) = + o_, NLL(Y;, Yi), where

ATT Tae mY —  Vaeel(SOK ,, OPW) y

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE