Name: _ SID: __
This is an individual exam. Cases of collaboration/copying will be given scores of zero.
Please print your name below to indicate that you understand that this exam is to be completed alone, without outside assistance, and that you understand the penalty for failing to do so:
Printed Name:
We estimate the following model where $price_{jt}$ is the price of sandwich j at time t, and is a function of the calories of sandwich j at time t (which is measured as the difference between each sandwich's calories and 40 calories) using a dataset with 602 observations.
The OLS regression line is
$$\widehat{price_{jt}} = 3.5 + 0.185 * (Calories_{jt} - 40)$$$$(0.9)~~~(0.044)~~~~~~~~~~~~~~$$Where standard errors of the estimated coefficients are in parentheses.
(a) (4 Points) Please construct a 95% confidence interval of the predicted average price of a sandwich with 40 calories.
➡️ The degree of freedom for 602 observations is $df=602-2=600$
The z-score corresponding to the 95% confidence interval is $z_{\frac{\alpha}{2}}=1.95996$
Therefore, the 95% confidence interval is $\hat{\mu} \pm 1.95996* SE$
$\Rightarrow(0.185-1.95996*0.044,0.185+1.95996*0.044)$ $\Rightarrow(0.09876176 , 0.27123824)$
0.185-1.95996*0.044
0.185+1.95996*0.044
(b) (5 Points) Do you find evidence that more calories are associated with higher prices at the 10 percent significance level for a one-sided alternative? Do all 5 steps in hypothesis testing.
➡️ Step 1: Defined Hypothesis:
$$H_0:\beta =0$$$$H_1:\beta>0$$
Step 2: Test Statistics:
$$t=\frac{\hat{\beta}-0}{SE}=\frac{0.185-0}{0.044}=4.2045$$Step 3: Critical Value:
The critical value of t for degree of freedom 600 and $\alpha=0.10$ is $c=1.282964$
Step 4: Decision:
Since, $t > c$, the null hypothesis is rejected.
Step 5: Conclusion:
It is concluded that More calories are associated with higher price.
t_statistics=0.185/0.044
t_statistics
(c) (5 Points) When we calculate the correlation between actual and predicted prices we obtain a value of 0.34. What is the regression’s goodness of fit measure adjusted R-squared’s value? Round to four decimal places.
➡️ The Coefficient of Determination, $R^2 = r^2 = 0.34^2 =0.1156$
Adjusted $R^2$ is calculated using the formula:
Adjusted $ R^2 = 1-\frac{(1-R^2)(N-1)}{N-p-1}$
Where, $N=602$ is the sample size and $p=1$ is the number of predictors.
$\Rightarrow$ Adjusted $ R^2 = 1-\frac{(1-0.1156)(602-1)}{602-1-1}=0.1141$
R_square=0.34**2
N=602;p=1;
AdjustedR_square=1-(1-R_square)*(N-1)/(N-p-1)
print(R_square)
print(AdjustedR_square)
[1] 0.1156 [1] 0.114126
(d) (5 Points) We run the regression of prices on (Calories-40) and add the level of saturated fat (measured in grams) to the regression. We note that the coefficient on saturated fat is -0.05 with a standard error of (0.01). Moreover, in that regression the estimated coefficient of (Calories-40) is now 0.06 with a standard error of (0.02). What does this tell you in terms of the correlation between the variable (Calories-40) and the variable saturated fat content of sandwiches?
➡️
The sign of estimated Coefficient of Saturated Fat is negative, hence, Saturated Fat is negatively correlated with Price of Sandwitch. Additionally, With the inclusion of Saturated Fat, the estimated coefficient of (Calories-40) is 0.06. When the Saturated Fat is omitted, the estimated coefficient of (Calories-40) was 0.185 (Increased).
Due to omitted variable bias, the beta coefficient of (Calories-40) is biased upwards or the bias is positive. Hence, the correlation between Saturated Fat and (Calories-40) is negative.
In general, when the omitted variable A is in negative correlation with the response variable and omission results in positive bias to a variable B, then A and B are negatively correlated.
# Include any code used for EX1-(d) here. (Coding Cell) Final answers do not belong in this cell.
(e) (7 Points) We wish to test the null hypothesis that the coefficients on “Calories-40” is equal to 0.03 and the coefficient on “saturated fat” is equal to -0.03 at the 5% level. Please perform the 5 steps in hypothesis testing and conclude, given that the Residual standard error from the unrestricted regression is 1.9 and the Sum of Squared Residuals (SSR) for the restricted regression is 2850.
➡️ The number of Restrictions, $q=2$
Number of Explanatory Variable, $k=2$
Number of Observations, $N=602$
Sum of Square of Residual for unrestricted Regression, $SSR_{UR}=1.9^2*(N-K-1)=1.9^2*599=2162.39$
Hypothesis Testing
Step 1: Defined Hypothesis:
$H_0:\beta_1 =0.03$ & $\beta_2 =-0.03 $
$H_1:\beta_2 \ne 0.03$ or $\beta_2 \ne -0.03 $
Step 2: Test Statistics:
$$F=\frac{(SSR_{R}-SSR_{UR})/q}{SSR_{UR}/(N-K-1)}=\frac{(2850-2162.39)/2}{2162.39/599}=95.2368$$Step 3: Critical Value:
The critical value of F for $\alpha=0.05$ and degree of freedom $q=2, N-K-1=599$ is $c=3.011$
Step 4: Decision:
Since, $F > c$, the null hypothesis is rejected.
Step 5: Conclusion:
It is concluded that the coefficient of either 'calories-40' is not 0.03 or 'Saturated Fat' is not -0.03
1.9^2*599
(2850-2162.39)/(2*2162.39)*599
We estimated the following model where $price_{jt}$ is the price of good j at time t, and is a function of the calories ($Calories_{jt}$) of product j at time t using a subset of the same dataset with 36 observations. We now want to control for factors specific to each product and also factors changing year to year that could affect prices common to all products. We have 3 different types of sandwiches and 2 years in the data, and these sandwiches are sold in different regions of the country. Here is the regression you would use to estimate the effect of calories on prices controlling for those factors:
$$price_{jt} =~ ∝_0 ~+~ ∝_1 ~ type_1 ~+~ ∝_2 ~ type_2 ~+~ ∝_3 ~ type_3 ~+~ β ~ Calories_{jt} ~+~ γ ~ Year_2 ~+~ ε_{jt}$$(a) (4 Points) Can you estimate all coefficients $∝_0, ∝_1, ∝_2, ∝_3, β, γ$ by OLS? Why or why not? Explain briefly with reference to collinearity.
➡️ All the coefficients Can not be Estimated
Explanation:
Each of the coefficients can be estimated if and only if there is no multi collinearity between the predictors Sandwitch Type, Calories and Year in the panel data. In the given case, there are three types of Sandwitch so including all the three Fixed Effect in the regression equation results in Multi Collinearity. Hence, either of the three type needs to be ommitted.
(b) (4 Points) What is the variable $type1$ in this regression? Please define it and explain what it looks like in the data set.
➡️
Type 1 is a dummy Variable corresponding to Sandwitch Type 1. It is a fixed effect due to Sandwitch Type 1.
If Sandwich is Type1 then $type1=1$ and $0$ otherwise.
(c) (4 Points) Please interpret the meaning and statistical significance of the OLS estimate $\hat{∝_0}$ = 5 with a standard error of 1.8. (2-3 sentences max)
➡️
As the variable $type3$ is ommited. $\hat{∝_0}$=5 corresponds to the fixed effect due to Type 3 sandwitch in the year 1 i.e. $Year_2=0$. It is marginal effect of Type 3 sandwich on Price controlling the fixed effect due to Type1 and Type 2 sandwitches and by Controlling the effect due to Calories. Selling of Type 3 sandwitch in a region results in increase in price by 5 unit.
(d) (4 Points) Please interpret the meaning and statistical significance of the OLS estimate $\hat{∝_1}$ = -3 with a standard error of 5. (2-3 sentences max)
➡️
The estimated coefficient $\hat{∝_1}$ = -3 corresponds to fixed effect due to Type 1 Sandwitch in the Year 1. For selling the type 1 sandwitch by Controlling the other types and for a fixed calories and year 1, the price decreased by 3.
(e) (4 Points) Please interpret the meaning and statistical significance of the OLS estimate $\hat{γ}$ = 2.9 with a standard error of 1.6. (2-3 sentences max)
➡️
The estimated coefficient $\hat{γ}$ = 2.9 corresponds to effect due to Year 2 relative to Year 1. The change in year from 1 to 2 controlling all other fixed effect due to sandwitch type and effect due to calories, the price of the product increases by 2.9 unit.
(f) (4 Points) Please interpret the meaning and statistical significance of the OLS estimate $\hat{β}$ = 0.02 with a standard error of 0.002. (2-3 sentences max)
➡️
The estimated coefficient $\hat{\beta}$ = 0.02 corresponds to effect due to Calories. By controlling all the fixed effect due to type of Sandwiches and by controlling the fixed effect due to year 1, the marginal effect in price of good $j$ in time period $t$ due to Calories is 0.02 unit.
Now assume there are more than 3 sandwiches in our data (using all 602 observations). We learn that a random subset of the sandwiches in our sample were affected by a quasi-experiment. On half of them a sticker was added that explained that the workers in the restaurant earn a living wage because they operate in counties that were subject to a 15-dollar hourly minimum wage regulation. The other half of the sandwiches were served in regions that did not have a minimum wage of 15 dollars an hour (and whose sandwiches did not feature stickers). Moreover, regulation happened in Year 2 and did not happen in Year 1, so we can observe the prices of sandwiches before and after the minimum wage regulation is implemented and the sticker added to the sandwich wrapper in regulated counties. We also have access to data on the sandwiches’ nutritional characteristics and restaurant and employee characteristics in year 1.
(a) (5 Points) Below is a table showing the average prices in Year 1 and Year 2 in areas that received the minimum wage and disclosure regulations and in non-regulated areas. Please complete all the missing pieces in this table. Show any work in the box below and write all missing values in the table.
| Average Sandwich Prices | Regulated=0 | Regulated=1 | Difference (Reg-Not Regulated) |
|---|---|---|---|
| Year2 = 0 | $$4.2$$ | $$6$$ | $$(iii)$$ |
| Year2 = 1 | $$5$$ | $$ (ii) $$ | $$(iv)$$ |
| Difference (Year2 – Year1) | $$(i)$$ | $$3$$ | $$(v)$$ |
➡️ (i) = 0.8
➡️ (ii) = 9
➡️ (iii) = 1.8
➡️ (iv) = 4
➡️ (v) = 2.2
# Include any code used for EX3-(a) here. (Coding Cell) Final answers do not belong in this cell.
(b) (6 Points) Below is the equation we estimate to measure the causal effect of regulation on sandwich prices.
$$ \widehat{price} = \hat{δ_0} + \hat{δ_1} Year_2 + \hat{δ_2} Regulated + \hat{δ_3} Regulated*Year2 $$What are the values of all the estimated coefficients in this price equation given what you know in (a)? Place your answers next to the coefficients in the box below.
➡️ $\hat{δ_0} = 4.2$
➡️ $\hat{δ_1} =0.8 $
➡️ $\hat{δ_2} = 1.8$
➡️ $\hat{δ_3} = 2.2$
# Include any code used for EX3-(b) here. (Coding Cell) Final answers do not belong in this cell.
(c) (2 Points) What is the impact analysis method used for causal identification in the equation in (b)? (1 sentence max)
➡️ The impact analysis method used for causal identification is Difference in Differences Method.
(d) (5 Points) Please test the null hypothesis that regulation has no causal effect on prices in a two-sided test at the 1 percent significance level given the provided standard errors: $\widehat{se(δ_0)}$ = 4.2, $\widehat{se(δ_1)}$ = 0.5, $\widehat{se(δ_2)}$ = 0.2, $\widehat{se(δ_3)}$ = 0.5. Use the 5 steps of hypothesis testing.
➡️ Hypothesis Testing
Step 1: Defined Hypothesis:
$$H_0:\beta_3 =0$$$$H_1:\beta_3 \ne 0$$
Step 2: Test Statistics:
$$t=\frac{\hat{\beta_3}-0}{SE}=\frac{2.2-0}{0.5}=0.44$$Step 3: Critical Value:
The critical value of t is $c=2.548$ for $\alpha=0.01$.
Step 4: Decision:
Since, $t < c$, the test fails to reject the null hypothesis.
Step 5: Conclusion:
It is concluded that the regulation has no causal effects.
# Include any code used for EX3-(d) here. (Coding Cell) Final answers do not belong in this cell.
(e) (5 Points) You do a balance test on the characteristics of regulated and unregulated sandwiches in year 1. The p-values for the equality of averages of your observable characteristics of sandwiches and restaurants in regulated and unregulated areas in year 1 are all greater than 0.4. Are you assured that the quasi-experimental randomization is present given the method you are using to measure the causal effect of regulation on prices? Why or why not, explain briefly. (3-4 sentences max)
➡️
The p-values for the equality of averages of your observable characteristics of sandwiches and restaurants in regulated and unregulated areas in year 1 are all greater than 0.4, shows that the test fails to reject the null hypothesis and it can be easily inferred that we do not have sufficient evidence to proof validate the assumption that quasi experimental randomization is present.
The R output below shows the summary statistics for prices of 46 footlong sandwiches and 60 panini sandwiches.
avg_footlong <- mean(mydata\$price[which(mydata\\$footlong==1)])
avg_panini <- mean(mydata\$price[which(mydata\\$panini==1)])
avg_footlong
$\color{blue}{[1] ~ 3.44913}$
avg_panini
$\color{blue}{[1] ~ 4.22417}$
sd_footlong <- sd(mydata\$price[which(mydata\\$footlong==1)])
sd_panini <- sd(mydata\$price[which(mydata\\$panini==1)])
sd_footlong
$\color{blue}{[1] ~ 1.002813}$
sd_panini
$\color{blue}{[1] ~ 1.242687}$
(a) (8 Points) Test whether the average prices of footlong (footlong=1) sandwiches and panini sandwiches (panini=1) are equal at the 5% level. Use the 5 steps of hypothesis testing and round your answer to 4 decimal places.
➡️ Hypothesis Testing
Assuming Equal Variance
Defined Hypothesis:
$$H_0:\mu_1 =\mu_2$$$$H_1:\mu_1 \ne \mu_2$$
Step 2: Test Statistics:
$$t=\frac{\bar{x_1}-\bar{x_2}}{Sp*\sqrt(\frac{1}{n_1}+\frac{1}{n_2})}=-3.453744$$Where $Sp$ is the pooled variance $=1.145079$.
Degree of Freedom $df=46+60-2=104$
Step 3: Critical Value:
The critical value of t for 104 degree of freedom and for $\alpha=0.05$ is $c=1.983$
Step 4: Decision:
Since, $|t|> |c|$, the null hypothesis is rejected.
Step 5: Conclusion:
It is concluded that the the average prices of footlong Sandwich and Panini Sandwich are significantly different.
#Pooled Variance:
n1=46; n2=60;
s1=1.002813; s2=1.242687;
xbar1=3.44913; xbar2=4.22417;
sp=sqrt(((n1-1)*s1^2 +(n2-1)*s2^2)/(n1+n2-2))
print(sp)
t=(xbar1-xbar2)/(sp*sqrt(1/n1+1/n2))
print(t)
df=(46+60-2)
print(df)
[1] 1.145079 [1] -3.453744 [1] 104
(b) (4 Points) Construct a 99 percent confidence interval for the mean of prices for panini sandwiches. Round your answer to 4 decimal places.
➡️ The z-score for 99% confidence interval is 2.576. Therefore, the 99% Confidence Interval for the mean of prices for panini sandwiches is:
$( 4.22417-2.576*\frac{1.242687}{\sqrt{60}} , 4.22417+2.576*\frac{1.242687}{\sqrt{60}})$
$\Rightarrow (3.8109, 4.6374)$
# Include any code used for EX4-(b) here. (Coding Cell) Final answers do not belong in this cell.
4.22417-2.576*1.242687/sqrt(60)
4.22417+2.576*1.242687/sqrt(60)
We next want to understand the probability of individuals buying sandwiches made only from organic ingredients.
The R output below corresponds to the linear probability model of whether individuals bought organic sandwiches (choseOrganic = 1 or 0) as a function of whether individuals have children less than 6 years (kidslt6 =1 or 0) and the level of the individual’s education (educ) and family income (faminc).


(a) (2 Points) What is the estimate of the robust standard error for the less than 6 years children (kidslt6) parameter?
➡️ The estimate of the robust standard error for the parameter kidslt6 is 0.030003
(b) (3 Points) Please interpret whether education level significantly affects the probability of choosing organic, controlling for an individual having young (less than 6 years old) children and income.
➡️ The education level significantly affects the probability of choosing Sandwich made of only organic ingredients. The probability of choosing Organic sandwich increases by 0.0449 x 4=18% if the education level of inidividual increases to 4.
Next, please consider the output below that estimates a logit model and the corresponding marginal effects.

(c) (4 Points) Please construct a 90 percent confidence interval for the average marginal effect (AME) of having children less than 6 years old on the probability of choosing organic, controlling for education and income.
➡️ Written Solutions for Ex5(c)
The z score for 90% confidencer interval is 1.645. Therefore, 90% confidence interval of Children less than 6 year old on the probability of choosing organic is:
$(-0.2237-1.645*0.0.0328, -0.2237+1.645*0.0.0328) $
$\Rightarrow (-0.277656, -0.169744)$
# Include any code used for EX5-(c) here. (Coding Cell) Final answers do not belong in this cell.
-0.2237-1.645*0.0328
-0.2237+1.645*0.0328
(d) (3 Points) Why is this interval different from the 95 percent confidence interval reported above in the corresponding columns lower and upper? Explain briefly.
➡️ The interval is different because the calculated value is for 90% confidence level and the provided result is for the 95% confidence level. As the confidence level decreases, the confidence interval width also decreases which can be seen from the result displayed and calculated confidence interval.
(e) (3 Points) Does a one year increase in education significantly affect the probability of choosing organic in the logit model, all else equal? Explain.
➡️ One year increase in education level significantly increases the probability of choosing organic as the probability will increase by 4.55%

It was a pleasure to teach you this Spring and we value your hard work and focus during this remote semester of econometrics in EEP 118. We really appreciate you filling out the evaluations and giving us feedback on things that worked well and what we can improve.
We hope you have a good end of semester and hope to meet you one day in person in the future.
All the best
Sofia, James, and Sung
From your EEP 118 Spring 2021 team