Why Choose Us?
0% AI Guarantee
Human-written only.
24/7 Support
Anytime, anywhere.
Plagiarism Free
100% Original.
Expert Tutors
Masters & PhDs.
100% Confidential
Your privacy matters.
On-Time Delivery
Never miss a deadline.
Project 3: Softmax Regression and NN In this project, we will explore softmax regression and neural networks
Project 3: Softmax Regression and NN
In this project, we will explore softmax regression and neural networks. The project files are available
for download in the p3 folder on the course Files page.
Files to turn in: softmax.py Implementation of softmax regression
nn.py Implementation of fully connected neural network
partners.txt Lists the full names of all members in your team.
writeup.pdf Answers all the written questions in this assignment
Files you created for Extra-Credits
You will be using helper functions in the following .py files and datasets: utils.py Helper functions for Softmax Regression and NN data/* Training and dev data for SR and NN
Please do not change the file names listed above. Submit partners.txt even if you do the project
alone. Only one member in a team (even if it is a cross-section team) is supposed to submit the project.
AutogradingPlease do not change the names of any provided functions or classes within the code, or you will wreak havoc on the autograder. However, the correctness of your implementation -- not the autograder's output -- will be the final judge of your score. If necessary, we will review and grade
assignments individually to ensure that you receive due credit for your work. Please do not import
(potentially unsafe) system-related modules such as sys, exec, eval... Otherwise the autograder will
assign a zero point without grading.
Part I Softmax Regression [45%]
Files to edit/turn in for this part
softmax.py
The goal of this part of the project is to implement Softmax Regression in order to classify the MNIST
digit dataset. Softmax Regression is essentially a two-layer neural network where the output layer
applies the Softmax cost function, a multiclass generalization of the logistic cost function.
In logistic regression, we have a hypothesis function of the form
where is our weight vector. Like the hyperbolic tangent function, the logistic function is also a
sigmoid function with the characteristic 's'-like shape, though it has a range of (0, 1) instead of (-1, 1).
Note that this is
technically not a classifier since it returns probabilities instead of a predicted class, but it's easy to turn
it into a classifier by simply choosing the class with the highest probability.
Since logistic regression is used for binary classification, it is easy to see that:
Similarly,
From this form it appears that we can assign the vector as the weight vector for class 1 and
as the weight vector for class 0. Our probability formulas are now unified into one equation:
This immediately motivates generalization to classification with more than 2 classes. By assigning a
separate weight vector to each class, for each example we can predict the probability that it is
class , and again we can classify by choosing the most probable class. A more compact way of
representing the values is where each row of W is . We can also represent a dataset
with a matrix where each column is
a single example.
Qsr1 (10%)
(1) Show that the probabilities sum to 1.
(2) What are the dimensions of ? ? ?
We can also train on this model with an appropriate loss function. The Softmax loss function is given
by
where is the number of examples, is the number of classes, and is an indicator
variable that equals 1 when the statement inside the brackets is true, and 0 otherwise. The gradient
(which you will not derive) is given by:
Note that the indicator and the probabilities can be represented as matrices, which makes the code
for the loss and the gradient very simple. (See here
(http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/) for more details)
softmax.py contains a mostly-complete implementation of Softmax Regression. A code stub also has
been provided in run_softmax.py. Once you correctly implement the incomplete portions of
softmax.py, you will be able to run run_softmax.py in order to classify the MNIST digits.
Qsr2 (15%)
(1) Complete the implementation of the cost function.
(2) Complete the implementation of the predict function.
Check your implementation by running:
>>> python run_softmax.py
The output should be:
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 7840 M = 10
At X0 0 variables are exactly at the bounds
At iterate 0 f= 2.30259D+00 |proj g|= 6.37317D-02
At iterate 1 f= 1.52910D+00 |proj g|= 6.91122D-02
At iterate 2 f= 7.72038D-01 |proj g|= 4.43378D-02
...
At iterate 401 f= 2.19686D-01 |proj g|= 2.52336D-04
At iterate 402 f= 2.19665D-01 |proj g|= 2.04576D-04
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
7840 402 431 1 0 0 2.046D-04 2.197D-01
F = 0.21966482316858085
STOP: TOTAL NO. of ITERATIONS EXCEEDS LIMIT
Cauchy time 0.000E+00 seconds.
Subspace minimization time 0.000E+00 seconds.
Line search time 0.000E+00 seconds.
Total User time 0.000E+00 seconds.
Accuracy: 93.99%
Qsr3 (10%)
In the cost function, we see the line
W_X = W_X - np.max(W_X)
This means that each entry is reduced by the largest entry in the matrix.
(1) Show that this does not affect the predicted probabilities.
(2) Why might this be an optimization over using W_X? Justify your answer.
Qsr4 (10%)
Use the learningCurve function in runClassifier.py to plot the accuracy of the classifier as a function of
the number of examples seen. Include the plot in your write-up. Do you observe any overfitting or
underfitting? Discuss and expain what you observe
In this part of the project, we'll implement a fully-connected neural network in general for the MNIST
dataset. For Qnn1_ you will complete nn.py. For Qnn2, create your own files.
A code stub also has been provided in run_nn.py. Once you correctly implement the incomplete
portions of nn.py, you will be able to run run_nn.py in order to classify the MNIST digits.
The dataset is included under ./data/. You will be using the following helper functions in “utils.py". In
the following description, let Ak, N, d denote the number of classes, number of samples and number
of features.
e Selected functions in “utils.py"
loadMNIST(image file, label file) #returns data matrix X with shape (d, N) and labels with shape (N,)
onehot(labels) # encodes labels into one hot style with shape (K,N)
acc(pred label, Y) # calculate the accuracy of prediction given ground truth Y, where pred label is with s
hape (N,), Y is with shape (N,K).
data _loader(X, Y=None, batch_size=64, shuffle=False) # Iterator that yield X,Y with shape(d, batch size) a
nd (K, batch_size).
Qnn1 (20% for Qnn1.1, 1.2, 1.3 and 5% for Qnn 1.4) Implement the NN
The scaffold has been built for you. Initialize the model and print the architecture with the following:
>>> from nn import NN, Relu, Linear, SquaredLoss
>>> from utils import data loader, acc, save plot, loadMNIST, onehot
>>> model = NN(Relu(), SquaredLoss(), hidden_layers=[128,128])
>>> model. print_model()
Two activation functions (Relu, Linear) and self.predict(X) have been implemented for you.
Qnn7.7 Implement squared loss cost functions (TODO 0 & TODO 1)
Assume Y is the output of the last layer before loss calculation (without activation), which is a K-by-N
matrix. Y is the one hot encoded ground truth of the same shape. Implement the following loss
function and its gradient (You need to calculate and implement the gradient of the loss function
yourself) (Notice that the loss functions are normalized by batch_size JV):
7\ — 1 N ly 2 liv 9
L(Y,Y) =<, sil¥i - Vill? = sv ll¥ - Y||7,9 where Y; is the z-th column of Y.
Typically we would use cross entropy loss, the formula of which is provided for your reference (but
you are only required to implement for squared loss):
Lon(Y,Y) = + o_, NLL(Y;, Yi), where
ATT Tae mY — Vaeel(SOK ,, OPW) y
Expert Solution
Please download the answer files using this link
https://drive.google.com/file/d/1uCLpVPzx63d7R1ZG58IP9b9hGz-StHoL/view?usp=sharing
Archived Solution
You have full access to this solution. To save a copy with all formatting and attachments, use the button below.
For ready-to-submit work, please order a fresh solution below.





