Fill This Form To Receive Instant Help
Homework answers / question archive / CMSC 422, Spring 22 Sections: CMSC 422, Spring 22 Final Exam Student’s first and last name: Grade (grader only): Student’s Section Number : Student’s UID: University Honor Pledge: I pledge on my honor that I have not given or received any unauthorized assistance on this assignment/examination
CMSC 422, Spring 22 Sections: CMSC 422, Spring 22
Final Exam
Student’s first and last name: Grade (grader only):
Student’s Section Number :
Student’s UID:
University Honor Pledge:
I pledge on my honor that I have not given or received
any unauthorized assistance on this assignment/examination.
Confidential: Please DO NOT SAVE or SHARE this exam!
You can proceed and submit your solutions only if you consent.
Exam Guidelines / Rules
• Print this exam and work on the exam sheets. Alternatively, use a tablet to work on the exam
e-sheets.
• Write neatly. If we can’t read your response, you will receive no credit for it. Use the scrap paper
provided at the end of the exam for note-taking. You can use extra scrap paper if you need it.
Problem 1: Na??ve Bayes Classifiers (20 pts)
Consider the binary classification problem where class label Y ∈ {0, 1} and
each training example X has 2 binary attributes X = [X1, X2] ∈ {0, 1}2
.
Assume that class priors are given P(Y = 0) = P(Y = 1) = 0.5, and that the conditional probabilities P(X1|Y ) and P(X2|Y ) are given as follows:
2
POA Poa
PX=0 | 09 | 05
-Xisi) 03) 08
(a) /6 pts/ What is the naive Bayes prediction g(x) for the input « =
71, £2| = [0,0]? Explain your reasoning.
(b) /6 pts/Assume you are not given the probability distributions P(Y ),
P(X1|Y) or P(X2|Y), and are asked to estimate them from data instead.
How many parameters would you need to estimate?
(c) [4 pts/ Assume you want to estimate the conditional probability distribu-
tion P(Y |X,, X2) directly, without making the naive Bayes assumption.
How many parameters would you need to estimate from data?
(d) [4 pts] Assume you now want to estimate the joint probability distri-bution P(Y, X1, X2) directly. How many parameters would you need to estimate from data?
Problem 2: Forward and Backward Propagation (20 pts)
Consider the 2-layer neural network below. The output unit is a linear
unit that takes 3 inputs, h1, h2 and x2. The hidden units are rectified linear
units: h1 = ReLU(a1) and h2 = ReLU(a2). Recall that ReLU is defined
as:
ReLU(x) =0 if x < 0 (0.1)
ReLU(x) =x otherwise (0.2)
.
x1 = +1
x2 = −2
a1 = h1 =
a2 = h2 = y? =
w11 = +1
w12 = −2
w21
=
−1
w22
= −0.5
v1 = −0.5
v2 = +1
v3 = +0.5
(a) [5pts] Using the forward propagation algorithm, fill in the figure above
with the values of a1, a2, h1, h2 and y? for input example x = [+1, −2].
(b) [2 pts] Give the expression of y? as a function of x1, x2, w11, w12, w21, w22,
v1, v2, v3 and the ReLU(.) function.
c) [6 pts] The correct class for example x = [x1, x2] = [+1, −2] is y = −1.0.
You run the backpropagation algorithm to minimize the squared error
loss l = 1
2(y − y?)2
on this single example. Derive the mathematical
expression of the gradients of the loss l with respect to weights w11 and
w22, and calculate its numerical value.
∂l
∂w11
=
∂l
∂w22
=
(d) [4 pts] Indicate how the value of each parameter below changes after the
update: does it increase, decrease, or stay the same?
w11: w12:
w21: w22:
(e) [3 pts] Derive the update rule for parameter v3 when running the back- propagation algorithm on the same example x, with the squared loss l and a step size η = 1. Hint: you will need to (1) derive the appro- priate gradient, (2) evaluate the gradient, (3) use the update rule of back propagation.
Problem 3: Support Vector Machines(20 pts)
Consider the dataset illustrated in the Figure below. There are 3 positive examples (triangles) and 4 negative examples (circles). Let’s train a Support
Vector Machine classifier to separate positive from negative examples.
1 2 3 4 5
1
2
3
4
5
x1
x2
(a) [4 pts] Draw the SVM decision boundary in the Figure above.
(b) [4 pts] Give a function f(x1, x2) such that f(x1, x2) = 0 for all points
(x1, x2) that are on the decision boundary.
(c) [5 pts] Two examples are added to the previous dataset, as illustrated
below. We set the SVM hyperparameter C to a very large value (C →
+∞). Draw the SVM decision boundary in the Figure below. Explain.
1 2 3 4 5
1
2
3
4
5
x1
x2
(d) [5 pts] We now set the SVM hyperparameter C to a very small value
(C ≈ 0). Draw the SVM decision boundary below and explain.
1 2 3 4 5
1
2
3
4
5
x1
x2
(e) [2 pts] We continue working with the same 9 point dataset, but change
the classifier to an SVM with a quadratic kernel. Draw a decision bound-
ary that can be plausibly obtained for a very large value of C below. (No
explanation required).
1 2 3 4 5
1
2
3
4
5
x1
x2
Please download the answer file using this link
https://drive.google.com/file/d/12FuDLZNFxKYZ3NGAfk5c4RcUY64AjJag/view?usp=sharing