Homework answers / question archive /
Open book applied machine learning 2020 exam
Answer any 3 questions from 4, all questions carry equal marks
Open book applied machine learning 2020 exam
Answer any 3 questions from 4, all questions carry equal marks
Computer Science
Share With
Open book applied machine learning 2020 exam
Answer any 3 questions from 4, all questions carry equal marks.
Question 1
The following table presents the Pearson coefficients from a dataset.

Evaluate the table for potential multicolinear attributes. Explain the reasoning behind the

choices you have made.
(ii) Evaluate the table for attribute selection. Explain the reasoning behind the potential attributes that you have selected, based on the Pearson coefficients.
(6 Marks)
behind the choices you have made (provide the steps and calculations you used to support your decisions)
The histogram presented, represents 398 cars surveyed for their fuel efficiency (miles per

Evaluate the Histogram for potential outliers or deemed missing data. Explain the reasoning

(ii) Explain how you would deal with the evaluation findings from part (i).
(8 Marks)
Question 1 contd
(C) The preexamination of the class distribution is an important exercise before developing classification models .
 Discuss this statement explaining why this is an appropriate preexamination technique, and discuss the implications of not conducting this technique.
 Provide examples of problem situations where this technique would be useful when examining the model's performance.
(6 Marks)
Question 2
 The terms type I error and a type II error are often discussed when model performance is presented.

 Explain how you would evaluate a type I error and a type II error.

 Given that the model is trying to identify patients with a life treating disease, discuss this problem situation concerning both types of errors, also explaining which you think is a more important error in this case and why.
(6 Marks)
 The following table contains the performance results for classification models a and b, (Accuracy, Sensitivity and Specificity). Where both models are trying to identify sports injuries
before they happen
Assumption: B is the most suitable to predict sports injuries before they happen .


 Explain why someone would make this incorrect assumption, using the values presented in the table above to aid your answer.


 Explain your reason why Model A is the most suitable model for predicting sports injuries before they happen, using the values presented in the table above to aid your answer.
(6 Marks)
 Tenfold Cross Validation Machine Learning model validation techniques (the best technique to use).


 Explain what is the most important


 Explain an alternative to Tenfold Cross Validation? Compare and contrast the two techniques (10fold Cross Validation and the alternative technique), giving examples of problem situations where each technique may be more suitable.
(8 Marks)
Question 3
 The kvalue in the KNN classification algorithm can be selected using the elbow method.

 Explain why you would initially decide a kvalue to be even or odd?

 Explain how you would evaluate the most appropriate kvalue for a KNN algorithm using the above figure and the elbow method.


 Marks)
 The naïve Bayes Machine Learning Algorithm is often, a high performing classification algorithm.

 Explain why the Baysian based algorithm, includes in the title and how this may affect the models performance

 Compare and contrast the naïve Bayes algorithm with two other Machine Learning
Algorithm .
(8 Marks)
 Semisupervised learning is an approach that is sometimes required, combining both supervised learning and unsupervised learning .

 Describe a problem situation where semisupervised learning is required.

 Explain why this approach is needed for the answer in part (i), describing why alone, supervised learning or unsupervised learning would be unable to address the problem situation mentioned.


 Marks)
Question 4
 The Hyperparameters batch size and epochs are fundamental in the development of an Artificial Neural Network (ANN).
(i) Explain how you would evaluate and select a suitable batch size and epochs.
(6 Marks)
 Bootstrap is a statistical estimation technique where a statistical quantity like a mean is estimated from multiple random samples of your data (with replacement)

 Discuss this statement, explaining in your own words how the Bootstrap preprocessing technique works.

 Provide an example of a problem situation where this technique should be considered, explaining why you think Bootstrap, is suitable for this problem situation.
(6 Marks)
 Statistical testing is often used to compare the performance of two or more machine learning models.


 Compare and contrast any two methods of statistical testing for comparing two or more Machine Learning models.
When reporting a statistical test result, many people do not present the entire picture, calling into question the findings.


 Discuss what are the most important parts of a statistical test to report, so that there is no ambiguity in the findings, explaining your reason for each part selected.
(8 Marks)