Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / Question 2 is related to the topics on Introduction to the Modelling Process and Linear Regression

Question 2 is related to the topics on Introduction to the Modelling Process and Linear Regression

Statistics

Question 2 is related to the topics on Introduction to the Modelling Process and Linear
Regression. You are required to utilise the tidymodels framework for this question.
dataset: bikes
This question will use a real-world example from Capital Bike Share, a bike rental program
providing service to the Washington, D.C.,area. The dataset includes daily bicycle rental
information for the two-year period from 2011–2012. Imagine that we were hired by the
mayor’s office in Washington, D.C., to help them deal with a growing traffic congestion
problem. The city introduced a low-cost bike-sharing program in an attempt to reduce the
number of cars on the roads. However, after some early successes, the city has started to receive
an increasing number of complaints about bike shortages on certain days and an oversupply of
bikes on other days. In an attempt to address the problem, the city decided to partner with a
national bike rental company to manage the supply of bikes to the city. As part of the
partnership agreement, the city will need to provide to the bike rental company daily estimates
of demand for the entire city. Since the inception of the program, the city has collected
information on the number of bikes rented daily, along with corresponding weather and
seasonal data.
The dataset includes several weather-related variables for our analysis:
• temperature is the average daily air temperature in degrees Fahrenheit.
• humidity is the average daily humidity, expressed as a decimal number ranging from
0.0 to 1.0.
• windspeed is the average daily wind speed, in miles per hour.
• realfeel is a measurement derived from temperature, humidity, cloud cover, and other
weather factors to describe the temperature perceived by a person outdoors. It is
measured in degrees Fahrenheit.
• weather is a categorical variable used to describe the weather conditions, using the
following scale:
1: Clear or partly cloudy
2: Light precipitation
3: Heavy precipitation
In addition to this weather information, we also have some variables that describe
characteristics of each day. These include the following:
• date is the calendar day described in each instance, including the day,
month, and year.
• season is the calendar season for the record, expressed as follows:
1: winter
2: spring
3: summer
4: fall
• weekday is the day of the week for the record, expressed as an integer ranging from 0
(Sunday) through 6 (Saturday).
• holiday is a binary variable that is 1 if the day was a holiday and 0 otherwise.
Finally, the dataset includes the outcome variable:
rentals that describes the number of bicycle rental transactions that occurred during the
given day.
As consultants to the mayor, our task is to use this observed data to develop a model that
predicts the daily demand for bike rentals across the entire city based upon some or all of the
other provided characteristics. This will help potential partners predict the demand for bicycles
on a given day, allowing them to both forecast revenue and ensure that sufficient bicycles are
on the street to meet rider demand.
Required:
You are required to perform a multiple linear regression to develop a predictive model on
factors influencing rentals. The independent variables considered in the model will be all the
variables stated above (except for rental, the outcome variable and date). In order to develop
this model, you have to perform the following steps:
i) Visualise the bivariate correlations between the quantitative variables. What do these
relationships indicate about the relationship between the independent variables with the
dependent variable and with each other? Can you detect a possible violation in the
regression assumptions based on this output?
(3 marks)
ii) Present relevant graphs to visualise the relationships between each quantitative
independent variable with the dependent variable. Briefly interpret the relationships
between these variables.
(3 marks)
iii) Use a function from the tidymodels package to split the data into training and testing data
sets.
(1 mark)
iv) Use the tidymodels ecosystem and conduct the following on the training dataset:
a. Perform crossvalidation analyses on two multiple regression models. One model
includes all the independent variables stated above (except for date) (Model 1). For
the other model, use the same variables but add second degree polynomial terms for
some of the quantitative variables based on the visualisations depicted in (ii) (Model
2).
(5 marks)
b. Prior to performing the crossvalidation, conduct some relevant model pre-processing
using the recipes and the tune functions. You can have at least 1 usage of the recipes
function and 1 usage of the tune function for each model.
(4 marks)
c. Evaluate both models to determine the best model.
(2 marks)
v) Use the best model evaluated from the crossvalidation analysis and test the predictive
ability of the model on the testing dataset.
(2 marks)
vi) Based on the training data, use the same independent variables as Model 1 (the model
without polynomial terms), but add at least one relevant interaction term to the model.
You may refer to this model as Model 3. Present the results in a tidy format.
(2 marks)
vii) Interpret the coefficients of Model 3 in terms of the influence of the independent variables
on rentals. When interpreting the results, you should summarise similar findings together.
(6 marks)
(Total Marks for Question 2 = 28 marks)

Option 1

Low Cost Option
Download this past answer in few clicks

22.99 USD

PURCHASE SOLUTION

Already member?


Option 2

Custom new solution created by our subject matter experts

GET A QUOTE