Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / Question 1 Clean the ABS data set Download ABS data set

Question 1 Clean the ABS data set Download ABS data set

Business

Question 1 Clean the ABS data set Download ABS data set. Fill the table in the template for question 2. Then explain the data cleaning process in less than 150 words. Include the following issues in your explanations. • Categorical and numerical variables. • The number of missing values. • Outliers associated with every variable. • Identify how many observations you have after cleaning the data. Question 2 Based on the clean data set from question 2, A. Use descriptive analytics techniques to identify the relation of all the variables with the variable absenteeism time in hours. Excluding the variable absenteeism time in hours, there are twenty variables in the data set. Therefore, you should develop twenty figure and/ or tables to identify the relationship between every variable and the variable absenteeism time in hours. Based on developed figures and/ or tables choose six most relevant variables. Include the visuals (figures/ tables) associated with the six most relevant variables in the answer sheet. Interpret them in less than 150 words. (10 marks) B. Develop a regression model with absenteeism time in hours as the output variable and the six variables identified in part 3.A as the input variables. Present the regression table and the regression equation. Comment on the regression table and regression equation. Word limit is 150 words. (10 marks) C. Try to increase the accuracy of the model in several iterations. Use different techniques to increase the model accuracy as you judge them suitable, for example, you can include or exclude different variables, or you can combine different levels of a categorical variable. Present a final regression equation and a final regression table. Interpret the final regression table and equation. Explain how you increased the accuracy of the model. Please use less than 300 words for this section. (20 marks) Note: the accuracy of the model can be low. Question 3 Based on the clean data set from question 2, create a new column and name it high_absenteeism. If the absenteeism in hours is more than 8, high_absenteeism is equal to 1 otherwise it is equal to 0. Choose the six most relevant numerical variables as independent variables to develop a logistic regression model with high_absenteeism as the dependentvariable. • Partition the data as 70 % and 30 % for the training and the test set, respectively. Present the logistic regression equation for the training and the test set. Comment on the logistic regression equations. Explain the procedure for selecting the most relevant variables. Please use less than 150. words for this section. Note: for using the app you should save the file as CSV. Note: try to delete all the irrelevant variables from the dataset and only include the six independent variables and one dependent variable. Otherwise, you increase the chance crashing the app.

Option 1

Low Cost Option
Download this past answer in few clicks

26.99

PURCHASE SOLUTION

Already member?


Option 2

Custom new solution created by our subject matter experts

GET A QUOTE

Related Questions