7 Twins and Returns to Schooling
Estimates of the returns to schooling for Twinsburg twins (Ashenfelter and Krueger 1994; Ashenfelter and Rouse 1998). This replicates the analysis in Table 6.2 of Mastering ’Metrics.
library("tidyverse")
library("sandwich")
library("lmtest")
library("AER")
Load twins data.
data("pubtwins", package = "masteringmetrics")
Run a regression of log wage on controls (Column 1 of Table 6.2).
mod1 <- lm(lwage ~ educ + poly(age, 2) + female + white, data = pubtwins)
coeftest(mod1, vcov = sandwich)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.1791 0.1631 7.23 1.3e-12 ***
#> educ 0.1100 0.0104 10.54 < 2e-16 ***
#> poly(age, 2)1 4.9643 0.5697 8.71 < 2e-16 ***
#> poly(age, 2)2 -4.2957 0.5919 -7.26 1.1e-12 ***
#> female -0.3180 0.0397 -8.00 5.4e-15 ***
#> white -0.1001 0.0679 -1.47 0.14
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note: The age coefficients are different (but equivalent) to those reported in the Table due to the the use of poly(age, .), which calculates orthogonal polynomials.
Run regression of the difference in log wage between twins on the difference in education (Column 2 of Table 6.2).
mod2 <- lm(dlwage ~ deduc, data = filter(pubtwins, first == 1))
coeftest(mod2, vcov = sandwich)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.0296 0.0275 1.07 0.2835
#> deduc 0.0610 0.0198 3.09 0.0022 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Run a regression of log wage on controls, instrumenting education with twin’s education (Column 3 of Table 6.2).
mod3 <- ivreg(lwage ~ educ + poly(age, 2) + female + white |
. - educ + educt, data = pubtwins)
summary(mod3, vcov = sandwich, diagnostics = TRUE)
#>
#> Call:
#> ivreg(formula = lwage ~ educ + poly(age, 2) + female + white |
#> . - educ + educt, data = pubtwins)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.69585 -0.29218 0.00494 0.26262 2.47060
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.0636 0.2113 5.03 6.2e-07 ***
#> educ 0.1179 0.0137 8.62 < 2e-16 ***
#> poly(age, 2)1 5.0367 0.5805 8.68 < 2e-16 ***
#> poly(age, 2)2 -4.2897 0.5928 -7.24 1.3e-12 ***
#> female -0.3149 0.0403 -7.81 2.2e-14 ***
#> white -0.0974 0.0682 -1.43 0.15
#>
#> Diagnostic tests:
#> df1 df2 statistic p-value
#> Weak instruments 1 674 796.30 <2e-16 ***
#> Wu-Hausman 1 673 0.92 0.34
#> Sargan 0 NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.507 on 674 degrees of freedom
#> Multiple R-Squared: 0.338, Adjusted R-squared: 0.333
#> Wald test: 56.8 on 5 and 674 DF, p-value: <2e-16
Note: The coefficient for years of education is slightly different than that reported in the book.
Run a regression of the difference in wage, instrumenting the difference in years of education with twin’s education (Column 4 of Table 6.2).
mod4 <- ivreg(dlwage ~ deduc | deduct,
data = filter(pubtwins, first == 1))
summary(mod4, vcov = sandwich, diagnostics = TRUE)
#>
#> Call:
#> ivreg(formula = dlwage ~ deduc | deduct, data = filter(pubtwins,
#> first == 1))
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.0423 -0.3111 -0.0274 0.2471 2.0824
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.0274 0.0277 0.99 0.3237
#> deduc 0.1070 0.0339 3.15 0.0018 **
#>
#> Diagnostic tests:
#> df1 df2 statistic p-value
#> Weak instruments 1 338 85.15 <2e-16 ***
#> Wu-Hausman 1 337 4.12 0.043 *
#> Sargan 0 NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.512 on 338 degrees of freedom
#> Multiple R-Squared: 0.0132, Adjusted R-squared: 0.0103
#> Wald test: 9.94 on 1 and 338 DF, p-value: 0.00176
Note: The coefficient for years of education is slightly different than that reported in the book.