**
Fill This Form To Receive Instant Help**

Homework answers / question archive / CMSC 422, Spring 22 Sections: CMSC 422, Spring 22 Final Exam Student’s first and last name: Grade (grader only): Student’s Section Number : Student’s UID: University Honor Pledge: I pledge on my honor that I have not given or received any unauthorized assistance on this assignment/examination

CMSC 422, Spring 22 Sections: CMSC 422, Spring 22

Final Exam

Student’s first and last name: Grade (grader only):

Student’s Section Number :

Student’s UID:

University Honor Pledge:

I pledge on my honor that I have not given or received

any unauthorized assistance on this assignment/examination.

Confidential: Please DO NOT SAVE or SHARE this exam!

You can proceed and submit your solutions only if you consent.

Exam Guidelines / Rules

• Print this exam and work on the exam sheets. Alternatively, use a tablet to work on the exam

e-sheets.

• Write neatly. If we can’t read your response, you will receive no credit for it. Use the scrap paper

provided at the end of the exam for note-taking. You can use extra scrap paper if you need it.

Problem 1: Na??ve Bayes Classifiers (20 pts)

Consider the binary classification problem where class label Y ∈ {0, 1} and

each training example X has 2 binary attributes X = [X1, X2] ∈ {0, 1}2

.

Assume that class priors are given P(Y = 0) = P(Y = 1) = 0.5, and that the conditional probabilities P(X1|Y ) and P(X2|Y ) are given as follows:

2

POA Poa

PX=0 | 09 | 05

-Xisi) 03) 08

(a) /6 pts/ What is the naive Bayes prediction g(x) for the input « =

71, £2| = [0,0]? Explain your reasoning.

(b) /6 pts/Assume you are not given the probability distributions P(Y ),

P(X1|Y) or P(X2|Y), and are asked to estimate them from data instead.

How many parameters would you need to estimate?

(c) [4 pts/ Assume you want to estimate the conditional probability distribu-

tion P(Y |X,, X2) directly, without making the naive Bayes assumption.

How many parameters would you need to estimate from data?

(d) [4 pts] Assume you now want to estimate the joint probability distri-bution P(Y, X1, X2) directly. How many parameters would you need to estimate from data?

Problem 2: Forward and Backward Propagation (20 pts)

Consider the 2-layer neural network below. The output unit is a linear

unit that takes 3 inputs, h1, h2 and x2. The hidden units are rectified linear

units: h1 = ReLU(a1) and h2 = ReLU(a2). Recall that ReLU is defined

as:

ReLU(x) =0 if x < 0 (0.1)

ReLU(x) =x otherwise (0.2)

.

x1 = +1

x2 = −2

a1 = h1 =

a2 = h2 = y? =

w11 = +1

w12 = −2

w21

=

−1

w22

= −0.5

v1 = −0.5

v2 = +1

v3 = +0.5

(a) [5pts] Using the forward propagation algorithm, fill in the figure above

with the values of a1, a2, h1, h2 and y? for input example x = [+1, −2].

(b) [2 pts] Give the expression of y? as a function of x1, x2, w11, w12, w21, w22,

v1, v2, v3 and the ReLU(.) function.

c) [6 pts] The correct class for example x = [x1, x2] = [+1, −2] is y = −1.0.

You run the backpropagation algorithm to minimize the squared error

loss l = 1

2(y − y?)2

on this single example. Derive the mathematical

expression of the gradients of the loss l with respect to weights w11 and

w22, and calculate its numerical value.

∂l

∂w11

=

∂l

∂w22

=

(d) [4 pts] Indicate how the value of each parameter below changes after the

update: does it increase, decrease, or stay the same?

w11: w12:

w21: w22:

(e) [3 pts] Derive the update rule for parameter v3 when running the back- propagation algorithm on the same example x, with the squared loss l and a step size η = 1. Hint: you will need to (1) derive the appro- priate gradient, (2) evaluate the gradient, (3) use the update rule of back propagation.

Problem 3: Support Vector Machines(20 pts)

Consider the dataset illustrated in the Figure below. There are 3 positive examples (triangles) and 4 negative examples (circles). Let’s train a Support

Vector Machine classifier to separate positive from negative examples.

1 2 3 4 5

1

2

3

4

5

x1

x2

(a) [4 pts] Draw the SVM decision boundary in the Figure above.

(b) [4 pts] Give a function f(x1, x2) such that f(x1, x2) = 0 for all points

(x1, x2) that are on the decision boundary.

(c) [5 pts] Two examples are added to the previous dataset, as illustrated

below. We set the SVM hyperparameter C to a very large value (C →

+∞). Draw the SVM decision boundary in the Figure below. Explain.

1 2 3 4 5

1

2

3

4

5

x1

x2

(d) [5 pts] We now set the SVM hyperparameter C to a very small value

(C ≈ 0). Draw the SVM decision boundary below and explain.

1 2 3 4 5

1

2

3

4

5

x1

x2

(e) [2 pts] We continue working with the same 9 point dataset, but change

the classifier to an SVM with a quadratic kernel. Draw a decision bound-

ary that can be plausibly obtained for a very large value of C below. (No

explanation required).

1 2 3 4 5

1

2

3

4

5

x1

x2

Already member? Sign In