Homework answers / question archive / EEL-6812 Advances in Neural Networks Spring 2022 PROJECT 1: Wine Quality Assessment – Regression & Classification Tasks Due:_____________: Students will submit a printed report and upload software deliverables to Canvas The Original Data: Wine makers from Portugal studied many types of wine both quantitatively and qualitatively

EEL-6812 Advances in Neural Networks Spring 2022 PROJECT 1: Wine Quality Assessment – Regression & Classification Tasks Due:_____________: Students will submit a printed report and upload software deliverables to Canvas The Original Data: Wine makers from Portugal studied many types of wine both quantitatively and qualitatively

Statistics

Share With

EEL-6812 Advances in Neural Networks Spring 2022

PROJECT 1: Wine Quality Assessment – Regression & Classification Tasks

Due:_____________: Students will submit a printed report and upload software deliverables to Canvas

The Original Data:

Wine makers from Portugal studied many types of wine both quantitatively and qualitatively. For each wine they measured 11 numerical chemical attributes (e.g., chlorides, density, pH value) and they asked 3 or more “experts” to assign a “quality value” (0 to 10), recording the median score from the experts.

The data is originally provided by the Machine Learning Repository of the University of California - Irvine (UCI) in 2 “Comma-Separated Values” (CSV) files:

winequality-red.csv ….. (4898 red wines)

"http://archive.ics.uci.edu/ml/machine-learning-databases/winequality/winequality-red.csv"

winequality-white.csv ….. (1599 white wines)

"http://archive.ics.uci.edu/ml/machine-learning-databases/winequality/winequality-white.csv"

This is how (for example) the “beginning” of winequality-white.csv looks like opening it with Notepad. Notice that the first row of text is the list of attribute names and, lastly, the label ‘quality’ :

Also notice that the items within each row (each “record”) are actually separated by “ ; “ (not really by commas).

Adapting the data for a REGRESSION model (red & white together), target = quality (0 to 10) The two files will be considered together (red & white wines) for a total of 4898 + 1599 = 6497 samples

and one more attribute (‘type’) will be added to each sample, to identify if the sample is from a red (type = 1) or a white (type = 0) wine.

Therefore, the neural network for solving this regression problem, will have:

12 inputs in the first layer (the 11 original attributes AND the new ‘type’ attribute)
1 output processing element in the output layer, since the result will be a single number (0 to 10) Changing the targets to CLASSIFY 4 QUALITY LEVELS (3 or less = “BAD”; 4, 5, or 6 = “MEDIUM” ; 7 or 8 = “GOOD” ; 9 or more = “EXCELLENT”)

From the original 6497 x 1 target array, ‘quality’, which can contain any of the original quality levels 0 to 10, we will create an ALTERNATIVE TARGET VECTOR, ‘level’, which will only indicate the “quality level” of each sample as follows:

‘quality’ value of the sample	‘level’ for the sample	Meaning
3 or less	1	BAD quality LEVEL
4, 5 or 6	2	MEDIUM quality LEVEL
7 or 8	3	GOOD quality LEVEL
9 or more	4	EXCELLENT quality LEVEL

A CLASSIFICATION MODEL, will provide as output an indication of the QUALITY LEVEL (BAD, MEDIUM, GOOD or EXCELLENT) of each wine sample.

To implement this, the 6497 x 1 ‘level’ array will be ‘one-hot-encoded’ to yield a 6497 x 4 array, where each of the 6497 samples will have as target a 4-element array where 3 of the numbers will be 0 and one will be 1 (This is done as it was performed for the Reuters Newswires classification example in the book, p. 107., Listing 4.14)

Therefore, the network to perform classification of the wines into the 4 LEVELS of quality will have;

12 inputs in the first layer (the 11 original attributes AND the new ‘type’ attribute)
4 processing elements in the output layer. Ideally one of them would turn to 1 and three would remain at 0, to indicate that the input sample belongs to one of the 4 levels of quality.

In Summary, this project requires you to develop TWO TYPES OF MODELS:

1. REGRESSION MODELS: regmodl
2. CLASSIFICATION MODELS: clasmodl

[PART I] : Getting the data as Numpy arrays

The Jupyter Notebook WineQ_prep.ipynb (printout attached) retrieves the original winequalitywhite.csv and winequality-red.csv directly from the UCI repository, originally as Pandas DataFrames, and:

Adds the ‘type’ information (red = 1, white = 0) as the 13^th column in both Dataframes
Appends the red DataFrame to the white DataFrame, yielding a single DataFrame, wines, with 6497 samples
Uses sklearn’s train_test_split() to create TRAINING [TR] and VALIDATION [TT] SPLITS – we will be using the results for training [TR], AND VALIDATION [TT].

IMPORTANT NOTE: IN THIS ASSIGNMENT WE WILL JUST BE SETTING ASIDE 10 SAMPLES

FROM THE ORIGINAL VALIDATION SET, AS “TEST DATA [TS]”. This is not for an extensive testing of the model. It will simply be so that, once the final model is defined, the student can interpret the results obtained by the model on the 10 “test” inputs.

The data will be originally split into 75% (4872) for training [TR], and 25% (1625) for validation [TT]. You are only required to do “simple hold-out validation” in this project.(AFTER 1625 inputs are separated for validation, 10 OF THOSE will be taken out for “testing”, leaving 1615 for effective use as validation set.)

Please note that using random_state = 45 in train_test_split() will randomly shuffle (as default) the overall data before splitting it, in a REPEATABLE WAY (You should get the same training, validation subsets every time you execute this command with this argument - You can use any integer as ‘random seed’, for example, 45, for as long as it is the same every time).

The training and validation input sets will be returned from train_test_split() as Pandas DataFrames. The notebook converts them to Numpy arrays (to be used in Keras).
Finally, the WineQ_Prep.ipynb notebook uses the already available target (rank-1) vectors (one for training, one for validation, and a small one, with 10 samples, for test) to create three additional target arrays, where the quality LEVELS have been mapped as indicated in the table above (1 = BAD, 2 = MEDIUM, 3 = GOOD, 4 = EXCELLENT) and one-hot-encoded, so there is a 4-value target for each pattern. – These additional target arrays are used for the classification model.

YOU MAY USE THE WineQ_Prep.ipynb NOTEBOOK TO GET STARTED IN DEVELOPING EACH ONE OF THE NOTEBOOKS THAT YOU MUST DEVELOP FOR EACH OF THE 6 MODELS YOU WILL DEVELOP FOR THIS PROJECT.

For this Part I, INCLUDE IN YOUR WRITTEN REPORT an “Introduction” section explaining the data set, the tasks (regression an 4-way classification) and how the data is prepared – YOU MAY USE THE TEXT IN THE FIRST 3 PAGES OF THIS DOCUMENT as the basis for your introduction. – No software deliverables (Program/Data Files) need to be submitted, for this Part I

[PART II]: The REGRESSION MODEL

In this project you are asked to:

II.1 Develop a (very simple) model, (regmodl1) – This models must only have 1 hidden layer, and no more than 8 processing elements in that layer.

II.2 Develop a (better) model that actually would be capable of overfitting (regmodl2)– There must be at least 2 hidden layers in this model.

II.3 Use your iterative observations of the performance of the model in II.2 to modify / tune hyperparameters (for example deciding how many epochs of training to allow) to arrive at a “Final Regression Model” (regmodl3), which will be considered to be the “best model” you could develop to solve this regression task.- There must be at least 2 hidden layers in this model.

For each one of these parts (II.1, II.2 and II.3 ) you must perform the fitting of the corresponding model including validation data, so you can see AND INCLUDE IN YOUR REPORT the comparison of training and validation loss, per epoch, plots. – During the fitting of the model, you must ask to monitor the

Mean Absolute Error (MEA) as metric. – INDICATE (NUMERICALLY) THE FINAL LOSS & MAE ON THE TRAINING SET, AND FINAL LOSS & MAE ON THE VALIDATION SET

Once you have defined and fitted your “best model” (regmodl3), use the method “predict” to find the value of the output activation that your regmodl3 generates for each of the 10 test patterns, and compare them to the 10 target values of the test set. With all that information, fill a table like this:

Pattern #

Features

f10

f11

f12

activation

target

error

Comment on the performance of your regmdl3 model on the test set.

For this Part II, INCLUDE IN YOUR WRITTEN REPORT the complete description (number of layers, PE s per layer, activation functions, optimizer used, loss function used, metrics used) of each of the models (regmodl1 , regmodl2 , regmodl3) and a combined plot of the train and validation losses per epoch (as an example, see Figure 4.4 in page 103 of the textbook) obtained from each model.

WRITE BRIEF EXPLANATIONS OF THE REASONING you followed to change the model from regmodl1 to regmodl2 and then from regmodl2 to regmodl3 and their training parameters.

Additionally, for Part II, you will include these Program/Data Files in your Canvas submission:

A Jupyter Notebook for each of the 3 regression models (regmodl1, regmodl2, regmodl3), including the data preparation part provided by WineQ_Prep.ipynb, and the development of the corresponding model, so that the instructor can review the process you followed for each of the models.
Your “saved model”, just for regmodl3, including the trained weights (so that the instructor could run the method model.evaluate on it for verification)

[PART III]: THE CLASSIFICATION MODEL

III.0 Find out, and indicate in your report, an estimate of the ACCURACY (“HIT RATIO”) OF A ‘RANDOM CLASSIFIER” ON THE VALIDATION SET. (See an example of how to find this practical estimate using Python, in page 111 of our textbook (5 commands just before the heading for Section 4.2.5).

III.1 Develop a (very simple) model , (clasmodl1) –This model must only have 1 hidden layer, and no more than 8 processing elements in that layer. – Is this model achieving better accuracy than a “random classifier”?

III.2 Develop a (better) model that actually overfits (clasmodl2)– There must be at least 2 hidden layers.

III.3 Use your iterative observations of the performance of the model in III.2 to modify / tune hyperparameters (for example deciding how many epochs of training to allow) to arrive at a “Final Classification Model” (clasmodl3), which will be considered to be the “best model” you could develop to solve this classification task.- There must be at least 2 hidden layers in this model.

For each one of these parts (III.1 , III.2 and III.3 ) you must perform the fitting of the corresponding model including validation data, so you can see AND INCLUDE IN YOUR REPORT the comparison of training and validation loss, per epoch, plots. During the fitting of the model, you must ask to monitor

the ACCURACY, as metric. – INDICATE (NUMERICALLY) THE FINAL LOSS & ACCURACY ON THE TRAINING SET, AND FINAL LOSS & ACCURACY ON THE VALIDATION SET

Once you have defined and fitted your “best model” (class modl3), use the method “predict” to find the value of the output activations that your classmodl3 generates for each of the 10 test patterns, and compare them to the 10 targets of the test set. With all that information, fill a table like this:

Pattern #

Features

f10

f11

f12

activations

target

Hit?(Y/N)

Comment on the performance of your clasmdl3 model on the test set.

For this Part III, INCLUDE IN YOUR WRITTEN REPORT the complete description (number of layers, PE s per layer, activation functions, optimizer used, loss function used, metrics used) of each of the models (clasmodl1 , clasmodl2 , clasmodl3) and a combined plot of the train and validation losses per epoch (as an example, see Figure 4.4 in page 103 of the textbook) obtained from each model.

WRITE BRIEF EXPLANATIONS OF THE REASONING you followed to change the model from clasmodl1 to clasmodl2 and then from clasmodl2 to clasmodl3 and their training parameters.

Additionally, for Part III, you will include these Program/Data Files in your submission:

A Jupyter Notebook, for each of the 3 classification models, (clasmodl1, clasmodl2, clasmodl3), including the data preparation part provided by WineQ_Prep.ipynb, and the development of the corresponding model so that the instructor can review the process you followed.
Your “saved model”, just for clasmodl3, including the trained weights (so that the instructor could run model.evaluate on it for verification)

Finally, IN YOUR WRITTEN REPORT, you must also include a final section titled “CONCLUSIONS”, where you describe what you learned, what happened as expected, what happened different from the way it was expected, how to further improve both (regression and classification) models, whether additional pre-processing of the data would have been beneficial, etc.)

AT THE END OF YOUR WRITTEN REPORT, YOU MUST INCLUDE, in an APPENDIX titled “CODE”:

Complete printout of the notebook for the development of regmodl1, INCLUDING THE DATA PREPARATION PORTION (which could be copied from WineQ_Prep.inynb), AND THE DEFINITION, COMPILATION AND FITTING OF regmdl1.
the notebook for regmdl2, HAVING DELETED THE DATA PREPARATION PORTION (no need to repeat it, since it already will appear in [1] above).
the notebook for regmdl3, HAVING DELETED THE DATA PREPARATION PORTION (no need to repeat it, since it already will appear in [1] above).
Complete printout of the notebook for the development of clasmodl1, INCLUDING THE DATA PREPARATION PORTION (which could be copied from WineQ_Prep.ipynb), AND THE DEFINITION, COMPILATION AND FITTING OF clasmdl1.
the notebook for clasmdl2, HAVING DELETED THE DATA PREPARATION PORTION (no need to repeat it, since it already will appear in [4] above).
the notebook for clasmdl3, HAVING DELETED THE DATA PREPARATION PORTION (no need to repeat it, since it already will appear in [4] above).

YOUR WRITTEN REPORT MUST BE SUBMITTED TO THE INSTRUCTOR, ON THE DUE DATE, DURING CLASS TIME.

IN ADDITION, YOU MUST UPLOAD TO CANVAS (TO THE CORRESPONDING “ASSIGNMENT”:

PROJECT_01), A SINGLE ZIP ARCHIVE FILE THAT MUST CONTAIN ALL THE SOFTWARE DELIVERABLES FOR PARTS [II] AND [II] OF THIS PROJECT. ON-TIME CANVAS SUBMISSIONS WILL BE CLOSED AT 11:59 PM ON THE DUE DATE FOR THIS PROJECT.

pur-new-sol

Statistics

[PART I] : Getting the data as Numpy arrays

IMPORTANT NOTE: IN THIS ASSIGNMENT WE WILL JUST BE SETTING ASIDE 10 SAMPLES

[PART II]: The REGRESSION MODEL

[PART III]: THE CLASSIFICATION MODEL

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Related Questions

menu

Statistics

[PART I] : Getting the data as Numpy arrays

IMPORTANT NOTE: IN THIS ASSIGNMENT WE WILL JUST BE SETTING ASIDE 10 SAMPLES

[PART II]: The REGRESSION MODEL

[PART III]: THE CLASSIFICATION MODEL

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Related Questions