Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / Elements of Machine Learning, WS 2022/2023 Exercise Sheet #2: Classification Before solving the exercises, read the instructions on the course website

Elements of Machine Learning, WS 2022/2023 Exercise Sheet #2: Classification Before solving the exercises, read the instructions on the course website

Computer Science

Elements of Machine Learning, WS 2022/2023

Exercise Sheet #2: Classification Before solving the exercises, read the instructions on the course website. ? For each theoretical problem, submit a single pdf file that contains your answer to the respective problem. This file may be a scan of your (legible) handwriting. ? For each practical problem, submit a single zip file that contains – the completed jupyter notebook (.ipynb) file, – any necessary files required to reproduce your results, and – a pdf report generated from the jupyter notebook that shows all your results. ? For the bonus question, submit a single zip file that contains – a pdf file that includes your answers to the theoretical part, – the completed jupyter notebook (.ipynb) file for the practical component, – any necessary files required to reproduce your results, and – a pdf report generated from the jupyter notebook that shows your results. ? Every team member has to submit a signed Code of Conduct. Problem 1 (T, 8 Points). Logistic regression. 1. [4pts] In which setting is logistic regression applicable? Explain at least three problems with linear regression when applied in such a setting. 2. [1pts] What do we model with logistic regression? How are the independent variables and obtained probabilities related? 3. [1pts] In general, what is the meaning of odds? Write down the formula and explain in your own words. How do odds relate to logistic regression? 4. [1pts] Let X be a scalar random variable. Prove that p(X) = e β0+β1X 1 + e β0+β1X ⇐⇒ log p(X) 1 − p(X) = e β0+β1X . What is the relationship between the logistic and the logit function? Why is this information about the relationship important? Explain. 5. [1pts] Let Yθ be a binary random variable for which p(Yθ = 1) = e θ 1 + e θ , where θ R , and define a parameter vector β = [β0, . . . , βp] and feature vector X = [1, X1, . . . , Xp]. Show that odds(YXβ+Xi+δ) odds(YXβ) = exp(βiδ) , for some δ R and any i {1, . . . , p} and explain the meaning of this equality in your own words. 1 of 3 Elements of Machine Learning, WS 2022/2023 Jilles Vreeken and Aleksandar Bojchevski Exercise Sheet #2: Classification Problem 2 (T, 10 Points). Bayes-optimal classifier. The optimal misclassification error is achieved by the Bayes optimal classifier. This is the classifier that assigns every point X to its most likely class. That is, the Bayes optimal classifier predicts yˆ = f (x) = arg max y{0,1} P(Y = y|X = x) . 1. Consider a scalar feature X R 2 and a binary random variable Y , for which P(X|Y = 0) = ( 1 πr2 kXk ≤ r 0 otherwise , and P(X|Y = 1) = N 0 0 , σ2 1 0 0 1 = 1 2πσ2 exp − kXk 2 2σ 2 , with P(Y = 0) = cP(Y = 1) , where r, σ > 0 and 0 < c < 1 are parameters. (a) [6pts] Derive the Bayes optimal classifier for Y as a function of r and σ. (b) [2pts] Draw the decision boundary for σ = 1, r = e √ 2 ≈ 3.84 and c = exp(− 1 3 ); explain your observations. What will happen to the decision boundary, if we increase c while keeping all the other parameters fixed? Note: For this, you have to find the region of R 2 for which P(Y = 1|X) ≥ P(Y = 0|X). Hint: Use the Bayes formula given in the lecture. 2. [2pts] Given that the Bayes optimal classifier has the lowest misclassification error among all classifiers, why do we need any other classification method? Problem 3 (T, 5 Points). So Many Classifiers. We now know four different classifiers: K-NN, LDA, QDA, and Logistic Regression (LR). 1. [4pts] Which assumptions do each of the models make w.r.t. the data distribution? Depending on the type of decision boundary, which of the respective methods would you recommend? 2. ([1pts] Although LDA and LR often yield similar results, LR is often preferred. Give two reasons for this. Problem 4 (P, 19 Points). Speech Recognition. We will now consider LDA and QDA for a real-world speech recognition task. The data we consider contains digitized pronunciation of five phonemes: sh as in “she”, dcl as in “dark”, iy as the vowel in “she”, aa as the vowel in “dark”, and ao as the first vowel in “water”. These phonemes correspond to responses/classes (column name g). The dataset contains 256 predictors (log-periodograms, which is a common way of representing voice recordings in speech recognition). Use Practical_Problem_1.ipynb found in the a1_programming file from the course website. 1. [1pts] Load the phoneme data set phoneme.csv and split the dataset into a training and test set according to the speaker column. Then exclude the row.names, speaker and response column g from the features. 2. [2pts] Fit an LDA model to classify the response based on the predictors; then compute and report train and test error. Useful functions: sklearn.model_selection.StratifiedShuffleSplit. 2 of 3 Elements of Machine Learning, WS 2022/2023 Jilles Vreeken and Aleksandar Bojchevski Exercise Sheet #2: Classification 3. [3pts] Plot the projection of the training data onto the first two canonical coordinates of the LDA. Investigate the data projected on further dimensions using the dimen parameter. 4. [4pts] Select the two phonemes aa and ao. Fit an LDA model on this data set and repeat the steps done in (2). 5. [6pts] Repeat steps (2) and (4) using QDA and report your findings. Would you prefer LDA or QDA in this example? Why? 6. [3pts] Generate confusion matrices for the LDA and QDA model for aa and ao. Which differences can you observe between the models? Problem 5 (Bonus). Shattering Data. This bonus problem contains both theoretical and practical parts. 1. Theory. First, we dive into the classification flexibility of classifiers. We first define the ability of a family F of classifiers to “shatter” a set of points X, that is to correctly classify them into two classes, for any possible assignment of binary labels to them. The maximum number of distinct points that can be shattered by at least one member of a classifier in the family is called the Vapnik Chervonenkis (VC) dimension of the family. Formally, for a family of classifiers F that can classify points that lie on some domain D we define V C(F; D) = max n k N  
(X D), |X| = k : (S X) (f F) for which S = {x X | f(x) ≥ 0} o . (a) [1pts] Show that V C(FLC; R 2 ) = 3, where FLC is the family of all linear classifiers over two features, without allowing any interactions. For this, you have to find any example of 3 points which can be shattered, but also prove that no set of 4 points can be shattered. (b) [1pts] Show that V C(FLC; R 3 ) ≥ 4. (c) [1pts] Show that V C(FQDA; R 2 ) ≥ 6, where FQDA is the family of all QDA classifiers. 2. Practical. Consider the notebook Bonus_Problem.ipynb. (a) Open the dataset data_1 and study its distribution. Based on the intuition you acquired in the theoretical part of this problem, can it be classified sufficiently well with a linear classifier? (b) You will now modify the feature vectors of the observations in this dataset so that it can be classified with a linear classifier. To do so ? create at most two additional dummy variables (features) based on the original ones ? apply logistic regression on the derived feature vectors ? plot the distribution of the test dataset alongside the decision boundary of your classifier ? compute and measure your mis-classification error. Explain your observations. (c) Open the dataset data_2 and study its distribution. Based on the intuition you acquired in the theoretical part of this problem, can it be classified sufficiently well with a linear classifier? (d) You will now modify the feature vectors of the observations in this dataset so that it can be classified with a linear classifier. To do so ? transform the original feature vectors to form a single-dimensional feature, ? apply logistic regression on the derived feature vectors ? plot the distribution of the test dataset alongside the decision boundary of your classifier ? compute and measure your mis-classification error. Explain your observations. 3 of

Option 1

Low Cost Option
Download this past answer in few clicks

23.99 USD

PURCHASE SOLUTION

Already member?


Option 2

Custom new solution created by our subject matter experts

GET A QUOTE