Trusted by Students Everywhere

Google USA

Why Choose Us?

0% AI Guarantee

Human-written only.

24/7 Support

Anytime, anywhere.

Plagiarism Free

100% Original.

Expert Tutors

Masters & PhDs.

100% Confidential

Your privacy matters.

On-Time Delivery

Never miss a deadline.

Section A) Multiple choice (10 Marks) 1

Computer Science Mar 18, 2023

Section A) Multiple choice (10 Marks)

1. Choose the TRUE statement with respect to treating missing values

a. Dropping rows of missing variables is not safer when the size of data is large

b. Dropping rows of missing variables is safer when the size of data is small

c. Dropping rows of missing variables is not safer when the size of data is small

d. Never convert the missing values with meaningful data

2. What (is) are the purposes of data transformation?

a. to make data easier to model, and easier to understand

b. makes it easier to match patterns in the training data to patterns in new data

c. Treats the missing values

d. Replaces all NAs with meaningful values

3. Calibration data set is part of

a. Training data

b. Test Data

c. Normalized test data

d. Random data

4. Choose the function that finds the frequency distribution of a variable in a data?

a. table()

b. barplot()

c. freq()

d. mean()

5. Faceting is useful for

a. Breaking the distributions into different graphs

b. Creating exactly two graphs superimposed

c. Dividing the data into bins

d. Showing relationship between two categorical variables

6. Choose the data types in R that support Boolean values

a. Logical

b. Numeric

c. Character

d. Integer

7. Which of the following is TRUE with respect to NULL

a. It is a zero-length vector

b. It can be created using c() with no arguments

c. We don't know the value

d. It is used for missing values

8. Which of the following statistical measures indicate whether you can accept the hypothesis of predictor model?

a. P-value

b. R-squared

c. Std. error

d. T-value

9. Choose the statement(s) that are valid with respect to K-means clustering algorithm

a. It works when the data is all numeric and the distance metric is squared Euclidean

b. The value of K must be known in advance

c. It is slower than hierarchical clustering

d. It works on all types of data

10. Which of the following is (are) supervised learning methods?

a. Logistic regression

b. K-means

c. Linear Regression

d. Association rules

Section B: Short Descriptive questions - 20 marks

11. Write short notes on a) Hexbin plotandb) Shadow plot (2 Marks)

12. How will you replace the missing values with meaningful information? By what means will you identify the altered data points? (2 Marks)

13. What is the quartile and percentile? What is the command to find the same in R? Give an example to each (4 Marks)

14. List the categories of modeling methods and give example for each of the category (4 marks)

15. What is association rule? Give suitable example (2 Marks)

16. Write the two-by-two confusion matrix (2 Marks)

17. Discuss about the random forests. How is it different from basic decision tree ? (4 marks)

Section C: Essay type questions - 20 Marks

18. Explain the k-fold cross validation and its purpose with an example (5 Marks)

19. Discuss the linear and logistic regression with proper example (5 Marks)

20. Write in detail about the takeaway from various Clustering methods? (5 Marks)

21. After building a machine learning model in R, what will be revealed by the summary() function when passing the model as input? (5 Marks)

Expert Solution

For detailed step-by-step solution, place custom order now.

Need this Answer?

This solution is not in the archive yet. Hire an expert to solve it for you.

Get a Quote