Fill This Form To Receive Instant Help
Homework answers / question archive / Gallop Marketing has been gathering data on people's television viewing habits in smaller metropolitan areas
Gallop Marketing has been gathering data on people's television viewing habits in smaller metropolitan areas. Radhika Nanda, an analyst at Gallop, is trying to predict the number of households that tune in to a given television station at any time during a given calendar week. She has gathered data for 25 different stations/broadcast areas, and has run a simple linear regression model, where the number of households that tune in to a station (in 10,000s) sometime during the week is the dependent variable. The independent variable that she has used is the number of households (in 10,000s) with televisions in the broadcast area. The resulting regression model output appears below. Radhika has looked at the output and is discouraged with the results.
(a) Based on the above regression output, why might this regression not be a good model.
Radhika has decided to give her factors some more thought, and has come upon the idea that the number of households who tune in to a particular station during the week might also depend on whether or not the station's channel is VHF or UHF. For example, most VHF stations are major networks (like ABC, CBS, or NBC), which are viewed more often regardless of the size of the broadcast area. Radhika therefore has included a dummy variable for whether a station broadcasts on VHF (VHF = 1, UHF = 0).
The results of her multiple linear regression are as follows:
(b) Write a complete equation for the multiple linear regression that incorporates the estimated coefficients provided by the second regression output.Define in words all the variables used in the equation. Do the signs of the regression coefficients make sense? Why or why not?
(c) What are the degrees of freedom of the second regression model? Compute the 98% confidence interval for its "Number of Households" and "UHF/VHF" coefficients.
(d) Below are residual plots that Radhika has produced based on the second regression model. Do you see any problems with the model based on looking these plots? Why or why not?
(e) Radhika has computed the sample correlation of the data for "Number of Households" and "UHF/VHF" data. The results of Radhika's correlation computations are shown below.
Number of Households (10,000s) UHF, VHF
Number of Households (10,000s) 1.0000
UHF, VHF 0.0730 1.0000
Do these numbers indicate any possible problems with the regression? Why or why not? In order to check to validity of the second regression model, describe what additional test you would perform, and why.
Please see the attached file.
Problem 6.
Gallop Marketing has been gathering data on people's television viewing habits in smaller metropolitan areas. Radhika Nanda, an analyst at Gallop, is trying to predict the number of households that tune in to a given television station at any time during a given calendar week. She has gathered data for 25 different stations/broadcast areas, and has run a simple linear regression model, where the number of households that tune in to a station (in 10,000s) sometime during the week is the dependent variable. The independent variable that she has used is the number of households (in 10,000s) with televisions in the broadcast area. The resulting regression model output appears below. Radhika has looked at the output and is discouraged with the results.
(a) Based on the above regression output, why might this regression not be a good model?
Solution:
R Square is the value that defines what percentage of dependent variable is explained by the independent variable. Here it is 0.1862 or 18.62 percent. We could say that 18.62 percent effect of independent variable is on the dependent variable, which is very less. So we confirm that this regression model is not a good model.
Radhika has decided to give her factors some more thought, and has come upon the idea that the number of households who tune in to a particular station during the week might also depend on whether or not the station's channel is VHF or UHF. For example, most VHF stations are major networks (like ABC, CBS, or NBC), which are viewed more often regardless of the size of the broadcast area. Radhika therefore has included a dummy variable for whether a station broadcasts on VHF (VHF = 1, UHF = 0).
The results of her multiple linear regression are as follows:
(b) Write a complete equation for the multiple linear regression that incorporates the estimated coefficients provided by the second regression output.Define in words all the variables used in the equation. Do the signs of the regression coefficients make sense? Why or why not?
Solution:
Y = α + β0X1 + β0X2 + ε
Y = 2.3886 + 0.6757X1 + 1.6729X2 + ε
Where Y - dependent variable of the study, that is, number of households who tunes a particular channel,
X1 - Number of households (in 10,000s) with television in broadcast area.
X2 - whether a station broadcast on VHF.
Ε - Error term (Predicted - Observed)
Yes the signs of regression coefficients make sense because if the coefficient is positive then we can say that the dependent variable (number of households who tunes a particular channel) has a positive relation with the independent variables (X1 or X2) whose coefficient is positive. This means when the independent variable increases then the dependent variable will also be increased. If the coefficient is negative then we can say that the particular independent variable whose coefficient is negative increases then we would expect the dependent variable in a decreasing intend.
(c) What are the degrees of freedom of the second regression model? Compute the 98% confidence interval for its "Number of Households" and "UHF/VHF" coefficients.
Solution:
The degree of freedom for regression model is given by the number of coefficients minus one. Here we have 3 coefficients (including intercept), so 3 -1 =2 degrees of freedom will be followed by the regression model.
The confidence interval for the t-distribution is given by the formula,
β0 ±S.E * t α, df
Where β0 is the regression coefficient (0.6757), SE is the Standard Error (0.2060) and α represents the Significance level (1-0.98/2 = 0.02/2 = 0.01) and df is the degree of freedom (n-k, where n is the number of observations and k is the number of coefficients, 25-3 =22). By plugging in all the values into the formula we get,
0.6757 ± 0.2060 * 2.8188
Therefore the confidence interval for the number of house holds is (0.0950272, 1.2564).
Similarly for the VHF, UHF we have the confidence interval as,
1.6729 ± 0.2665 * 2.8188
(0.9217, 2.4241)
(d) Below are residual plots that Radhika has produced based on the second regression model. Do you see any problems with the model based on looking these plots? Why or why not?
Solution:
From the Plot1 we have not realized any particular pattern, that is, all the values randomly distributed over the line or Zero. So we may say that error is randomly distributed.
From the plot2 similar inference can be drawn. Since the variable UHF, VHF is a dummy variable we observe a plot like two separate lines but in fact this is also evenly distributed over the middle line so we say that the assumption for error is not violated.
Similarly in Plot3 we have the residuals above and below the zero which are not more distributed inside a particular interval so this too confirms the assumption of normality of the error.
Since no assumptions of regression are violated we can consider this as a good model.
(e) Radhika has computed the sample correlation of the data for "Number of Households" and "UHF/VHF" data. The results of Radhika's correlation computations are shown below.
Number of Households (10,000s) UHF, VHF
Number of Households (10,000s) 1.0000
UHF, VHF 0.0730 1.0000
Do these numbers indicate any possible problems with the regression? Why or why not? In order to check to validity of the second regression model, describe what additional test you would perform, and why.
Solution:
Above correlation matrix allow us to verify the problem of multicollinearity. As a rule if any two independent variables are more correlated (cutoff is 0.8 or -0.8, so above 0.8 or below -0.8) we confirm there is a problem of multicollinearity. But in our case the correlation between Number of households and UHF, VHF is 0.0730 which confirms us that there is no problem of multicollinearity which is another proof for a good model.
The additional test to validate the regression model is durbin-watson test which is useful to know whether the variables are autocorrelated or not because the independence assumption of the regression is based on the autocorrelation. If the autocorrelation exists between the variable then that will suggest us the violation of independence.