7 Twins and Returns to Schooling
Estimates of the returns to schooling for Twinsburg twins (Ashenfelter and Krueger 1994; Ashenfelter and Rouse 1998). This replicates the analysis in Table 6.2 of Mastering ’Metrics.
library("tidyverse")
library("sandwich")
library("lmtest")
library("AER")
Load twins
data.
data("pubtwins", package = "masteringmetrics")
Run a regression of log wage on controls (Column 1 of Table 6.2).
mod1 <- lm(lwage ~ educ + poly(age, 2) + female + white, data = pubtwins)
coeftest(mod1, vcov = sandwich)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.1791 0.1631 7.23 1.3e-12 ***
#> educ 0.1100 0.0104 10.54 < 2e-16 ***
#> poly(age, 2)1 4.9643 0.5697 8.71 < 2e-16 ***
#> poly(age, 2)2 -4.2957 0.5919 -7.26 1.1e-12 ***
#> female -0.3180 0.0397 -8.00 5.4e-15 ***
#> white -0.1001 0.0679 -1.47 0.14
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note: The age
coefficients are different (but equivalent) to those reported in the Table due to the the use of poly(age, .)
, which calculates orthogonal polynomials.
Run regression of the difference in log wage between twins on the difference in education (Column 2 of Table 6.2).
mod2 <- lm(dlwage ~ deduc, data = filter(pubtwins, first == 1))
coeftest(mod2, vcov = sandwich)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.0296 0.0275 1.07 0.2835
#> deduc 0.0610 0.0198 3.09 0.0022 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Run a regression of log wage on controls, instrumenting education with twin’s education (Column 3 of Table 6.2).
mod3 <- ivreg(lwage ~ educ + poly(age, 2) + female + white |
. - educ + educt, data = pubtwins)
summary(mod3, vcov = sandwich, diagnostics = TRUE)
#>
#> Call:
#> ivreg(formula = lwage ~ educ + poly(age, 2) + female + white |
#> . - educ + educt, data = pubtwins)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.69585 -0.29218 0.00494 0.26262 2.47060
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.0636 0.2113 5.03 6.2e-07 ***
#> educ 0.1179 0.0137 8.62 < 2e-16 ***
#> poly(age, 2)1 5.0367 0.5805 8.68 < 2e-16 ***
#> poly(age, 2)2 -4.2897 0.5928 -7.24 1.3e-12 ***
#> female -0.3149 0.0403 -7.81 2.2e-14 ***
#> white -0.0974 0.0682 -1.43 0.15
#>
#> Diagnostic tests:
#> df1 df2 statistic p-value
#> Weak instruments 1 674 796.30 <2e-16 ***
#> Wu-Hausman 1 673 0.92 0.34
#> Sargan 0 NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.507 on 674 degrees of freedom
#> Multiple R-Squared: 0.338, Adjusted R-squared: 0.333
#> Wald test: 56.8 on 5 and 674 DF, p-value: <2e-16
Note: The coefficient for years of education is slightly different than that reported in the book.
Run a regression of the difference in wage, instrumenting the difference in years of education with twin’s education (Column 4 of Table 6.2).
mod4 <- ivreg(dlwage ~ deduc | deduct,
data = filter(pubtwins, first == 1))
summary(mod4, vcov = sandwich, diagnostics = TRUE)
#>
#> Call:
#> ivreg(formula = dlwage ~ deduc | deduct, data = filter(pubtwins,
#> first == 1))
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.0423 -0.3111 -0.0274 0.2471 2.0824
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.0274 0.0277 0.99 0.3237
#> deduc 0.1070 0.0339 3.15 0.0018 **
#>
#> Diagnostic tests:
#> df1 df2 statistic p-value
#> Weak instruments 1 338 85.15 <2e-16 ***
#> Wu-Hausman 1 337 4.12 0.043 *
#> Sargan 0 NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.512 on 338 degrees of freedom
#> Multiple R-Squared: 0.0132, Adjusted R-squared: 0.0103
#> Wald test: 9.94 on 1 and 338 DF, p-value: 0.00176
Note: The coefficient for years of education is slightly different than that reported in the book.