9 Quarter of Birth and Returns to Schooling

This replicates Tables 6.4 and 6.5, and Figures 6.1 and 6.2 of Mastering ’Metrics. These present an IV analysis of the returns to schooling using quarters of birth (QOB) as instruments for years of schooling (Angrist and Krueger 1991).

library("AER")
library("sandwich")
library("lmtest")
library("tidyverse")
library("broom")

Load twins data.

data("ak91", package = "masteringmetrics")

Some cleaning of the data.

ak91 <- mutate(ak91,
               qob_fct = factor(qob),
               q4 = as.integer(qob == "4"),
               yob_fct = factor(yob))

Table 6.4. IV recipe for returns to schooling using a single QOB instrument. Regress log wages on 4th quarter.

mod1 <- lm(lnw ~ q4, data = ak91)
coeftest(mod1, vcov = sandwich)
#> 
#> t test of coefficients:
#> 
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  5.89827    0.00136 4329.13   <2e-16 ***
#> q4           0.00681    0.00274    2.48    0.013 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Regress years of schooling on 4th quarter.

mod2 <- lm(s ~ q4, data = ak91)
coeftest(mod2, vcov = sandwich)
#> 
#> t test of coefficients:
#> 
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 12.74731    0.00661    1929  < 2e-16 ***
#> q4           0.09212    0.01316       7  2.6e-12 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

IV regression of log wages on years of schooling, with 4th quarter as an instrument for years of schooling.

mod3 <- ivreg(lnw ~ s | q4, data = ak91)
coeftest(mod3, vcov = sandwich)
#> 
#> t test of coefficients:
#> 
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)    4.955      0.358   13.85   <2e-16 ***
#> s              0.074      0.028    2.64   0.0083 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

9.1 Table 6.5

Regression Estimates of Returns to Schooling using Quarter of Birth Instruments

Column 1. OLS

mod4 <- lm(lnw ~ s, data = ak91)
coeftest(mod4, vcov = sandwich)
#> 
#> t test of coefficients:
#> 
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 4.995182   0.005074     984   <2e-16 ***
#> s           0.070851   0.000381     186   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Column 2. IV with only the 4th quarter as an instrument.

mod5 <- ivreg(lnw ~ s | q4, data = ak91)
summary(mod5, vcov = sandwich, diagnostics = TRUE)
#> 
#> Call:
#> ivreg(formula = lnw ~ s | q4, data = ak91)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -8.7765 -0.2393  0.0713  0.3326  4.6536 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)    4.955      0.358   13.85   <2e-16 ***
#> s              0.074      0.028    2.64   0.0083 ** 
#> 
#> Diagnostic tests:
#>                     df1    df2 statistic p-value    
#> Weak instruments      1 329507     48.99 2.6e-12 ***
#> Wu-Hausman            1 329506      0.01    0.91    
#> Sargan                0     NA        NA      NA    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.638 on 329507 degrees of freedom
#> Multiple R-Squared: 0.117,   Adjusted R-squared: 0.117 
#> Wald test: 6.97 on 1 and 329507 DF,  p-value: 0.00829

The argument diagnostics = TRUE will run an F-test on the first stage which is reported as the “Weak instruments” diagnostic.

Column 3. OLS. Controls for year of birth.

mod6 <- lm(lnw ~ s + yob_fct, data = ak91)
coeftest(mod6, vcov = sandwich)
#> 
#> t test of coefficients:
#> 
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  5.017348   0.006019  833.65  < 2e-16 ***
#> s            0.071081   0.000381  186.34  < 2e-16 ***
#> yob_fct31   -0.006387   0.005123   -1.25  0.21251    
#> yob_fct32   -0.014838   0.005052   -2.94  0.00331 ** 
#> yob_fct33   -0.017583   0.005068   -3.47  0.00052 ***
#> yob_fct34   -0.020999   0.005062   -4.15  3.3e-05 ***
#> yob_fct35   -0.032895   0.005039   -6.53  6.7e-11 ***
#> yob_fct36   -0.031781   0.004970   -6.39  1.6e-10 ***
#> yob_fct37   -0.036712   0.004894   -7.50  6.4e-14 ***
#> yob_fct38   -0.036890   0.004856   -7.60  3.1e-14 ***
#> yob_fct39   -0.048164   0.004833   -9.96  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Column 4. IV reg using only the 4th quarter as an instrument. Controls for year of birth.

mod7 <- ivreg(lnw ~ s + yob_fct | q4 + yob_fct, data = ak91)
summary(mod7, vcov = sandwich, diagnostics = TRUE)
#> 
#> Call:
#> ivreg(formula = lnw ~ s + yob_fct | q4 + yob_fct, data = ak91)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -8.7785 -0.2346  0.0719  0.3405  4.6687 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  4.96599    0.35393   14.03   <2e-16 ***
#> s            0.07520    0.02841    2.65   0.0081 ** 
#> yob_fct31   -0.00696    0.00647   -1.08   0.2819    
#> yob_fct32   -0.01557    0.00708   -2.20   0.0279 *  
#> yob_fct33   -0.01855    0.00833   -2.23   0.0259 *  
#> yob_fct34   -0.02209    0.00909   -2.43   0.0151 *  
#> yob_fct35   -0.03425    0.01061   -3.23   0.0012 ** 
#> yob_fct36   -0.03338    0.01208   -2.76   0.0057 ** 
#> yob_fct37   -0.03857    0.01368   -2.82   0.0048 ** 
#> yob_fct38   -0.03910    0.01596   -2.45   0.0143 *  
#> yob_fct39   -0.05053    0.01705   -2.96   0.0030 ** 
#> 
#> Diagnostic tests:
#>                     df1    df2 statistic p-value    
#> Weak instruments      1 329498     47.73 4.9e-12 ***
#> Wu-Hausman            1 329497      0.02    0.88    
#> Sargan                0     NA        NA      NA    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.638 on 329498 degrees of freedom
#> Multiple R-Squared: 0.117,   Adjusted R-squared: 0.117 
#> Wald test: 1.81 on 10 and 329498 DF,  p-value: 0.054

Column 4. IV reg using all quarters as instruments. Controls for year of birth.

mod8 <- ivreg(lnw ~ s + yob_fct | qob_fct + yob_fct, data = ak91)
summary(mod8, vcov = sandwich, diagnostics = TRUE)
#> 
#> Call:
#> ivreg(formula = lnw ~ s + yob_fct | qob_fct + yob_fct, data = ak91)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -8.9945 -0.2544  0.0676  0.3509  4.8425 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  4.59174    0.25057   18.32  < 2e-16 ***
#> s            0.10525    0.02012    5.23  1.7e-07 ***
#> yob_fct31   -0.01111    0.00591   -1.88  0.05988 .  
#> yob_fct32   -0.02089    0.00623   -3.35  0.00080 ***
#> yob_fct33   -0.02556    0.00698   -3.66  0.00025 ***
#> yob_fct34   -0.03007    0.00742   -4.05  5.1e-05 ***
#> yob_fct35   -0.04414    0.00836   -5.28  1.3e-07 ***
#> yob_fct36   -0.04501    0.00930   -4.84  1.3e-06 ***
#> yob_fct37   -0.05207    0.01034   -5.04  4.7e-07 ***
#> yob_fct38   -0.05518    0.01184   -4.66  3.1e-06 ***
#> yob_fct39   -0.06780    0.01259   -5.39  7.2e-08 ***
#> 
#> Diagnostic tests:
#>                     df1    df2 statistic p-value    
#> Weak instruments      3 329496     32.32  <2e-16 ***
#> Wu-Hausman            1 329497      2.98   0.084 .  
#> Sargan                2     NA      3.26   0.196    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.647 on 329498 degrees of freedom
#> Multiple R-Squared: 0.0905,  Adjusted R-squared: 0.0905 
#> Wald test: 3.79 on 10 and 329498 DF,  p-value: 3.9e-05

9.2 Figures

Summarize the average wages by age:

ak91_age <- ak91 %>%
  group_by(qob, yob) %>%
  summarise(lnw = mean(lnw), s = mean(s)) %>%
  mutate(q4 = (qob == 4))

Average years of schooling by quarter of birth for men born in 1930-39 in the 1980 US Census.

ggplot(ak91_age, aes(x = yob + (qob - 1) / 4, y = s)) +
  geom_line() +
  geom_label(mapping = aes(label = qob, color = q4)) +
  theme(legend.position = "none") +
  scale_x_continuous("Year of birth", breaks = 1930:1940) +
  scale_y_continuous("Years of Education", breaks = seq(12.2, 13.2, by = 0.2),
                     limits = c(12.2, 13.2))

Average log wages by quarter of birth for men born in 1930-39 in the 1980 US Census.

ggplot(ak91_age, aes(x = yob + (qob - 1) / 4, y = lnw)) +
  geom_line() +
  geom_label(mapping = aes(label = qob, color = q4)) +
  scale_x_continuous("Year of birth", breaks = 1930:1940) +
  scale_y_continuous("Log weekly wages") +
  theme(legend.position = "none")