Why Choose Us?
0% AI Guarantee
Human-written only.
24/7 Support
Anytime, anywhere.
Plagiarism Free
100% Original.
Expert Tutors
Masters & PhDs.
100% Confidential
Your privacy matters.
On-Time Delivery
Never miss a deadline.
Machine Learning 1, SS22 Homework 2 PCA
Machine Learning 1, SS22 Homework 2
PCA. Neural Networks.
Contents
General remarks
Your submission will be graded based on:
- Correctness (Is your code doing what it should be doing? Is your derivation correct?)
- The depth of your interpretations (Usually, only a couple of lines are needed.)
- The quality of your plots (Is everything clearly visible in the print-out? Are axes labeled?............................................................................................................................... )
- Your submission should run with Python 3.5+.
For this assignment, we will be using an implementation of Multilayer Perceptron from scikit-learn. The documentation for this is available at the scikit website. The two relevant multi-layer perceptron classes are
– MLPRegressor for regression and MLPClassifier for classification.
For both classes (and all scikit-learn model implementations), calling the fit method trains the model, and calling the predict method with the training or testing data set gives the predictions for that data set, which you can use to calculate the training and testing errors.
1
2

- Neural Networks [6 points + 3* points]
- PCA and Classification [16 points]
|
Tasks:
-
-
PCA for dimensionality reduction. Load the dataset (features and targets). Use PCA from sklearn.decomposition to reduce the dimensionality of the data. Creating an instance of PCA class. Choose n components (number of principal components) such that about 85% of variance is explained. In the report, state n components that you used, and the exact percentage of variance explained that you get.
-
Hints: You will need to fit the model, and apply the dimensionality reduction of the original features, in order to obtain the data with a reduced dimension: (n samples, n components). Check the Attributes of this class, and find which one to use to get the percentage of variance explained. To narrow your search, the number of principal components should be: 100 < n components < 200.
|
For each n hid, report accuracy on the train and test set, and the current loss.
Answer the questions: How do we know (in general) if the model does not have enough capacity (the case of underfitting)? How do we know (in general) if the model starts to overfit? Does that happen with some architectures/models? (If so, say with what number of neurons that happens). Which model would you choose here and why?
|
-
-
Variability of the performance. Choose the best performing parameters that you found in the previous task, and vary the seed (parameter random state) with 5 different values of your choice. (Note: When the seed is fixed, we can reproduce our results, and we avoid getting different results for every run.) When changing the seed, what exactly changes? Report minimum and maximum accuracy, and mean accuracy over 5 different runs and standard deviation i.e., (mean ± std).- Using a model with any (fixed) seed of your choice, plot the loss curve (loss over iteration). Hint: Check the Attributes of the classifier.
Using a model with any (fixed) seed of your choice, calculate predictions on the test set. In the code, print the classification report and confusion matrix, and include either a screenshot of both of them, or copy the values to the report. How could you calculate yourself recall from support and confusion matrix entries? Explain in words what is recall. What is the most misclassified image? State which class (digit) it was, and how you concluded that.
-
-
- Model selection using GridSearchCV from sklearn [3* points] 3

Finding the best-performing model can be cumbersome. We can use, for example, GridSearchCV to find the best architecture, by trying out all the different combinations.
Tasks:
-
-
- We want to check all possible combinations of the parameters:
- α ∈ {0.0, 0.001, 1.0}
- activation ∈ {identity, logistic, relu}
- solver ∈ {lbfgs, adam}
hidden layer sizes ∈ {(100, ), (200, )}
- We want to check all possible combinations of the parameters:
-
Create a dictionary of these parameters that GridSearchCV from sklearn.model selection requires. How many different architectures will be checked? (State the number of architectures that will be checked and how you calculated it.)
-
-
Set max iter = 500, random state = 0, early stopping = True as default parameters of MLPClassifier.- What was the best score obtained? Hint: Check the Attributes of the classifier.
- What was the best parameter set? Hint: Check the Attributes of the classifier.
-
Neural Networks can be used for regression problems as well. In this task, we will train a neural network to approximate a function.
Tasks:
- Load the dataset (x-datapoints.npy are the features, y-datapoints.npy are the targets).
Implement the function calculate mse. In the report, include the code snippet.- Train the network to solve the task (use MLPRegressor from sklearn). You can perform either a manual search or a random / grid search to find a good model. Vary at least 3 different numbers of neurons in the hidden layer.
If you use manual search: Describe how you chose the final model - e.g., how many neurons you tried out, one or two layers, which optimizer, was it necessary to use early stopping (and if so, what was the percentage of validation set used), which activation function did you use for neurons (hidden layer sizes).
If you use random / grid search (i.e., GridSearchCV from sklearn): Report which dictionary of parameters you used, and what was the best model. You have to try out at least different number of neurons, different optimizers, and some form of regularization. (In total there should be at least 8 different combinations that were checked by GridSearch.)
- For the final choice of the model, report the final loss achieved.
Expert Solution
Please download the answer file using this link
https://drive.google.com/file/d/1LAjJocvcjxWzSdTH_zkqAtEorrEAGS8A/view?usp=sharing
Archived Solution
You have full access to this solution. To save a copy with all formatting and attachments, use the button below.
For ready-to-submit work, please order a fresh solution below.





